Skip to content

Commit

Permalink
Add support for directional language-tagged strings from RDF 1.2.
Browse files Browse the repository at this point in the history
  • Loading branch information
gkellogg committed Jun 28, 2023
1 parent 13164ed commit 16c4086
Show file tree
Hide file tree
Showing 10 changed files with 205 additions and 63 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ the 1.1 release of RDF.rb:

Notably, {RDF::Queryable#query} and {RDF::Query#execute} are now completely symmetric; this allows an implementation of {RDF::Queryable} to optimize queries using implementation-specific logic, allowing for substantial performance improvements when executing BGP queries.

## Differences between RDF 1.1 and RDF 1.2
* {RDF::Literal} has an optional `direction` property for directional language-tagged strings.

## Tutorials

* [Getting data from the Semantic Web using Ruby and RDF.rb](https://semanticweb.org/wiki/Getting_data_from_the_Semantic_Web_%28Ruby%29)
Expand Down Expand Up @@ -400,6 +403,7 @@ from BNode identity (i.e., they each entail the other)

* [Ruby](https://ruby-lang.org/) (>= 2.6)
* [LinkHeader][] (>= 0.0.8)
* [bcp47_spec][] ( ~> 0.2)
* Soft dependency on [RestClient][] (>= 2.1)

## Installation
Expand Down Expand Up @@ -481,8 +485,10 @@ This is free and unencumbered public domain software. For more information,
see <https://unlicense.org/> or the accompanying {file:UNLICENSE} file.

[RDF]: https://www.w3.org/RDF/
[N-Triples]: https://www.w3.org/TR/n-triples/
[N-Quads]: https://www.w3.org/TR/n-quads/
[LinkHeader]: https://github.com/asplake/link_header
[bcp47_spec]: https://github.com/dadah89/bcp47_spec
[N-Triples]: https://www.w3.org/TR/rdf-n-triples/
[N-Quads]: https://www.w3.org/TR/rdf-n-quads/
[YARD]: https://yardoc.org/
[YARD-GS]: https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
[PDD]: https://unlicense.org/#unlicensing-contributions
Expand All @@ -496,6 +502,7 @@ see <https://unlicense.org/> or the accompanying {file:UNLICENSE} file.
[SPARQL doc]: https://ruby-rdf.github.io/sparql
[RDF 1.0]: https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
[RDF 1.1]: https://www.w3.org/TR/rdf11-concepts/
[RDF 1.1]: https://www.w3.org/TR/rdf12-concepts/
[SPARQL 1.1]: https://www.w3.org/TR/sparql11-query/
[RDF.rb]: https://ruby-rdf.github.io/
[RDF::DO]: https://ruby-rdf.github.io/rdf-do
Expand Down
46 changes: 40 additions & 6 deletions etc/n-triples.ebnf
Original file line number Diff line number Diff line change
@@ -1,6 +1,40 @@
[1] ntriplesDoc ::= triple? (EOL triple)* EOL?
[2] triple ::= subject predicate object '.'
[3] subject ::= IRIREF | BLANK_NODE_LABEL
[4] predicate ::= IRIREF
[5] object ::= IRIREF | BLANK_NODE_LABEL | literal
[6] literal ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG)?
ntriplesDoc ::= triple? (EOL triple)* EOL?
triple ::= subject predicate object '.'
subject ::= IRIREF | BLANK_NODE_LABEL | quotedTriple
predicate ::= IRIREF
object ::= IRIREF | BLANK_NODE_LABEL | literal | quotedTriple
literal ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG )?
quotedTriple ::= '<<' subject predicate object '>>'

@terminals

IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
BLANK_NODE_LABEL ::= '_:' ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)?
LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* ('--' ('ltr'|'rtl'))?`
STRING_LITERAL_QUOTE ::= '"' ( [^#x22#x5C#xA#xD] | ECHAR | UCHAR )* '"'
UCHAR ::= ( "\u" HEX HEX HEX HEX )
| ( "\U" HEX HEX HEX HEX HEX HEX HEX HEX )
ECHAR ::= ("\" [tbnrf"'])
PN_CHARS_BASE ::= ([A-Z]
| [a-z]
| [#x00C0-#x00D6]
| [#x00D8-#x00F6]
| [#x00F8-#x02FF]
| [#x0370-#x037D]
| [#x037F-#x1FFF]
| [#x200C-#x200D]
| [#x2070-#x218F]
| [#x2C00-#x2FEF]
| [#x3001-#xD7FF]
| [#xF900-#xFDCF]
| [#xFDF0-#xFFFD]
| [#x10000-#xEFFFF])
PN_CHARS_U ::= PN_CHARS_BASE | '_'
PN_CHARS ::= (PN_CHARS_U
| "-"
| [0-9]
| #x00B7
| [#x0300-#x036F]
| [#x203F-#x2040])
HEX ::= ([0-9] | [A-F] | [a-f])
EOL ::= [#xD#xA]+
114 changes: 83 additions & 31 deletions lib/rdf/model/literal.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# -*- encoding: utf-8 -*-

require 'bcp47_spec'

module RDF
##
# An RDF literal.
Expand All @@ -9,7 +12,9 @@ module RDF
#
# Specific typed literals may have behavior different from the default implementation. See the following defined sub-classes for specific documentation. Additional sub-classes may be defined, and will interoperate by defining `DATATYPE` and `GRAMMAR` constants, in addition other required overrides of RDF::Literal behavior.
#
# In RDF 1.1, all literals are typed, including plain literals and language tagged literals. Internally, plain literals are given the `xsd:string` datatype and language tagged literals are given the `rdf:langString` datatype. Creating a plain literal, without a datatype or language, will automatically provide the `xsd:string` datatype; similar for language tagged literals. Note that most serialization formats will remove this datatype. Code which depends on a literal having the `xsd:string` datatype being different from a plain literal (formally, without a datatype) may break. However note that the `#has\_datatype?` will continue to return `false` for plain or language-tagged literals.
# In RDF 1.1, all literals are typed, including plain literals and language-tagged strings. Internally, plain literals are given the `xsd:string` datatype and language-tagged strings are given the `rdf:langString` datatype. Creating a plain literal, without a datatype or language, will automatically provide the `xsd:string` datatype; similar for language-tagged strings. Note that most serialization formats will remove this datatype. Code which depends on a literal having the `xsd:string` datatype being different from a plain literal (formally, without a datatype) may break. However note that the `#has\_datatype?` will continue to return `false` for plain or language-tagged strings.
#
# RDF 1.2 adds **directional language-tagged strings** which are effectively a subclass of **language-tagged strings** contining an additional **direction** component with value either **ltr** or **rtl** for Left-to-Right or Right-to-Left. This determines the general direction of a string when presented in n a user agent, where it might be in conflict with the inherent direction of the leading Unicode code points. Directional language-tagged strings are given the `rdf:langString` datatype.
#
# * {RDF::Literal::Boolean}
# * {RDF::Literal::Date}
Expand All @@ -23,16 +28,23 @@ module RDF
# value = RDF::Literal.new("Hello, world!")
# value.plain? #=> true`
#
# @example Creating a language-tagged literal (1)
# @example Creating a language-tagged string (1)
# value = RDF::Literal.new("Hello!", language: :en)
# value.language? #=> true
# value.language #=> :en
#
# @example Creating a language-tagged literal (2)
# @example Creating a language-tagged string (2)
# RDF::Literal.new("Wazup?", language: :"en-US")
# RDF::Literal.new("Hej!", language: :sv)
# RDF::Literal.new("¡Hola!", language: :es)
#
# @example Creating a directional language-tagged string
# value = RDF::Literal.new("Hello!", language: :en, direction: :ltr)
# value.language? #=> true
# value.language #=> :en
# value.direction? #=> true
# value.direction #=> :ltr
#
# @example Creating an explicitly datatyped literal
# value = RDF::Literal.new("2009-12-31", datatype: RDF::XSD.date)
# value.datatype? #=> true
Expand Down Expand Up @@ -105,8 +117,14 @@ def self.datatyped_class(uri)

##
# @private
def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false, canonicalize: false, **options)
raise ArgumentError, "datatype with language must be rdf:langString" if language && (datatype || RDF.langString).to_s != RDF.langString.to_s
def self.new(value, language: nil, datatype: nil, direction: nil, lexical: nil, validate: false, canonicalize: false, **options)
if language && direction
raise ArgumentError, "datatype with language and direction must be rdf:dirLangString" if (datatype || RDF.dirLangString).to_s != RDF.dirLangString.to_s
elsif language
raise ArgumentError, "datatype with language must be rdf:langString" if (datatype || RDF.langString).to_s != RDF.langString.to_s
else
raise ArgumentError, "datatype not compatible with language or direction" if language || direction
end

klass = case
when !self.equal?(RDF::Literal)
Expand All @@ -128,7 +146,7 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
end
end
literal = klass.allocate
literal.send(:initialize, value, language: language, datatype: datatype, **options)
literal.send(:initialize, value, language: language, datatype: datatype, direction: direction, **options)
literal.validate! if validate
literal.canonicalize! if canonicalize
literal
Expand All @@ -137,18 +155,24 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
TRUE = RDF::Literal.new(true)
FALSE = RDF::Literal.new(false)
ZERO = RDF::Literal.new(0)
XSD_STRING = RDF::URI("http://www.w3.org/2001/XMLSchema#string")

# @return [Symbol] The language tag (optional).
# @return [Symbol] The language-tag (optional). Implies `datatype` is `rdf:langString`.
attr_accessor :language

# @return [Symbol] The base direction (optional). Implies `datatype` is `rdf:dirLangString`.
attr_accessor :direction

# @return [URI] The XML Schema datatype URI (optional).
attr_accessor :datatype

##
# Literals without a datatype are given either xsd:string or rdf:langString
# depending on if there is language
# Literals without a datatype are given either `xsd:string`, `rdf:langString`, or `rdf:dirLangString`,
# depending on if there is `language` and/or `direction`.
#
# @param [Object] value
# @param [Symbol] direction (nil)
# Initial text direction.
# @param [Symbol] language (nil)
# Language is downcased to ensure proper matching
# @param [String] lexical (nil)
Expand All @@ -163,16 +187,24 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
# @see http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
# @see http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
# @see #to_s
def initialize(value, language: nil, datatype: nil, lexical: nil, validate: false, canonicalize: false, **options)
def initialize(value, language: nil, datatype: nil, direction: nil, lexical: nil, validate: false, canonicalize: false, **options)
@object = value.freeze
@string = lexical if lexical
@string = value if !defined?(@string) && value.is_a?(String)
@string = @string.encode(Encoding::UTF_8).freeze if instance_variable_defined?(:@string)
@object = @string if instance_variable_defined?(:@string) && @object.is_a?(String)
@language = language.to_s.downcase.to_sym if language
@direction = direction.to_s.downcase.to_sym if direction
@datatype = RDF::URI(datatype).freeze if datatype
@datatype ||= self.class.const_get(:DATATYPE) if self.class.const_defined?(:DATATYPE)
@datatype ||= instance_variable_defined?(:@language) && @language ? RDF.langString : RDF::URI("http://www.w3.org/2001/XMLSchema#string")
@datatype ||= if instance_variable_defined?(:@language) && @language &&
instance_variable_defined?(:@direction) && @direction
RDF.dirLangString
elsif instance_variable_defined?(:@language) && @language
RDF.langString
else
XSD_STRING
end
end

##
Expand Down Expand Up @@ -202,8 +234,8 @@ def literal?
#
# Compatibility of two arguments is defined as:
# * The arguments are simple literals or literals typed as xsd:string
# * The arguments are plain literals with identical language tags
# * The first argument is a plain literal with language tag and the second argument is a simple literal or literal typed as xsd:string
# * The arguments are plain literals with identical language-tags and directions
# * The first argument is a plain literal with language-tag and the second argument is a simple literal or literal typed as xsd:string
#
# @example
# compatible?("abc" "b") #=> true
Expand All @@ -224,19 +256,19 @@ def compatible?(other)
return false unless other.literal? && plain? && other.plain?

# * The arguments are simple literals or literals typed as xsd:string
# * The arguments are plain literals with identical language tags
# * The first argument is a plain literal with language tag and the second argument is a simple literal or literal typed as xsd:string
language? ?
(language == other.language || other.datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")) :
other.datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
# * The arguments are plain literals with identical language-tags
# * The first argument is a plain literal with language-tag and the second argument is a simple literal or literal typed as xsd:string
language? || direction? ?
(language == other.language && direction == other.direction || other.datatype == XSD_STRING) :
other.datatype == XSD_STRING
end

##
# Returns a hash code for this literal.
#
# @return [Integer]
def hash
@hash ||= [to_s, datatype, language].hash
@hash ||= [to_s, datatype, language, direction].compact.hash
end


Expand Down Expand Up @@ -270,6 +302,7 @@ def eql?(other)
self.value_hash == other.value_hash &&
self.value.eql?(other.value) &&
self.language.to_s.eql?(other.language.to_s) &&
self.direction.to_s.eql?(other.direction.to_s) &&
self.datatype.eql?(other.datatype))
end

Expand All @@ -290,7 +323,10 @@ def ==(other)
case
when self.eql?(other)
true
when self.language? && self.language.to_s == other.language.to_s
when self.direction? && self.direction == other.direction
# Literals with directions can compare if languages and directions are identical
self.value_hash == other.value_hash && self.value == other.value
when self.language? && self.language == other.language
# Literals with languages can compare if languages are identical
self.value_hash == other.value_hash && self.value == other.value
when self.simple? && other.simple?
Expand Down Expand Up @@ -342,14 +378,18 @@ def <=>(other)

##
# Returns `true` if this is a plain literal. A plain literal
# may have a language, but may not have a datatype. For
# may have a language and direction, but may not have a datatype. For
# all practical purposes, this includes xsd:string literals
# too.
#
# @return [Boolean] `true` or `false`
# @see http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal
def plain?
[RDF.langString, RDF::URI("http://www.w3.org/2001/XMLSchema#string")].include?(datatype)
[
RDF.langString,
RDF.dirLangString,
XSD_STRING
].include?(datatype)
end

##
Expand All @@ -359,19 +399,28 @@ def plain?
# @return [Boolean] `true` or `false`
# @see http://www.w3.org/TR/sparql11-query/#simple_literal
def simple?
datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
datatype == XSD_STRING
end

##
# Returns `true` if this is a language-tagged literal.
# Returns `true` if this is a language-tagged string.
#
# @return [Boolean] `true` or `false`
# @see http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal
# @see https://www.w3.org/TR/rdf-concepts/#dfn-language-tagged-string
def language?
datatype == RDF.langString
[RDF.langString, RDF.dirLangString].include?(datatype)
end
alias_method :has_language?, :language?

##
# Returns `true` if this is a directional language-tagged string.
#
# @return [Boolean] `true` or `false`
# @see https://www.w3.org/TR/rdf-concepts/#dfn-dir-lang-string
def direction?
datatype == RDF.dirLangString
end

##
# Returns `true` if this is a datatyped literal.
#
Expand All @@ -380,7 +429,7 @@ def language?
# @return [Boolean] `true` or `false`
# @see http://www.w3.org/TR/rdf-concepts/#dfn-typed-literal
def datatype?
!plain? && !language?
!plain? && !language? && !direction?
end
alias_method :has_datatype?, :datatype?
alias_method :typed?, :datatype?
Expand All @@ -393,10 +442,13 @@ def datatype?
# @return [Boolean] `true` or `false`
# @since 0.2.1
def valid?
return false if language? && language.to_s !~ /^[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*$/
BCP47.parse(language.to_s) if language?
return false if direction? && !%i{ltr rtl}.include?(direction)
return false if datatype? && datatype.invalid?
grammar = self.class.const_get(:GRAMMAR) rescue nil
grammar.nil? || value.match?(grammar)
rescue BCP47::InvalidLanguageTag
false
end

##
Expand Down Expand Up @@ -536,20 +588,20 @@ def inspect

##
# @overload #to_str
# This method is implemented when the datatype is `xsd:string` or `rdf:langString`
# This method is implemented when the datatype is `xsd:string`, `rdf:langString`, or `rdf:dirLangString`
# @return [String]
def method_missing(name, *args)
case name
when :to_str
return to_s if @datatype == RDF.langString || @datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
return to_s if [RDF.langString, RDF.dirLangString, XSD_STRING].include?(@datatype)
end
super
end

def respond_to_missing?(name, include_private = false)
case name
when :to_str
return true if @datatype == RDF.langString || @datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
return true if [RDF.langString, RDF.dirLangString, XSD_STRING].include?(@datatype)
end
super
end
Expand Down
2 changes: 1 addition & 1 deletion lib/rdf/ntriples.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ module RDF
#
# <https://rubygems.org/gems/rdf> <http://purl.org/dc/terms/title> "rdf" .
#
# ## RDFStar (RDF*)
# ## Quoted Triples
#
# Supports statements as resources using `<<s p o>>`.
#
Expand Down

0 comments on commit 16c4086

Please sign in to comment.