Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating xsd:date and xsd:dateTime #151

Open
tobiasschweizer opened this issue Jul 22, 2022 · 9 comments
Open

Validating xsd:date and xsd:dateTime #151

tobiasschweizer opened this issue Jul 22, 2022 · 9 comments

Comments

@tobiasschweizer
Copy link

Hi there,

I have a question regarding validation of xsd:date and xsd:dateTime. I am using pyshacl version 0.19.1.

Given the following shapes:

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "sh": "http://www.w3.org/ns/shacl#",
    "schema": "http://schema.org/",
    "rescs": "http://rescs.org/"
  },
  "@graph": [
    {
      "@id": "rescs:dash/creativework/CreativeWorkShape",
      "@type": "sh:NodeShape",
      "rdfs:comment": {
        "@type": "xsd:string",
        "@value": "The most generic kind of creative work, including books, movies, photographs, software programs, etc."
      },
      "rdfs:label": {
        "@type": "xsd:string",
        "@value": "CreativeWork"
      },
      "sh:property": [
        {
          "sh:datatype": {
            "@id": "xsd:date"
          },
          "sh:description": "The date on which the CreativeWork was created or the item was added to a DataFeed.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "dateCreated",
          "sh:path": {
            "@id": "schema:dateCreated"
          }
        }
      ],
      "sh:targetClass": {
        "@id": "schema:CreativeWork"
      }
    }
  ]
}

I noticed that schema:dateCreated only has to have the correct type annotation and the value has to be a string to be valid.

So this also does pass validation although it is not a xsd:date but an xsd:dateTime:

"schema:dateCreated": {
        "@type": "xsd:date",
        "@value": "2022-07-08T06:48:22.159262"
}

Does pyshacl actually check if the given value string is a valid date or is this somehow out of scope?

Thanks for your feedback!

@tobiasschweizer
Copy link
Author

I tried the above with https://github.com/TopQuadrant/shacl (CLI) version 1.4.2:

./shaclvalidate.sh -datafile datetime.ttl -shapesfile creativework.ttl
14:49:39 WARN riot :: [line: 2, col: 68] Lexical form '2022-07-08T06:48:22.159262' not valid for datatype XSD date
@Prefix dash: http://datashapes.org/dash# .
@Prefix graphql: http://datashapes.org/graphql# .
@Prefix owl: http://www.w3.org/2002/07/owl# .
@Prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@Prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@Prefix schema1: http://schema.org/ .
@Prefix sh: http://www.w3.org/ns/shacl# .
@Prefix swa: http://topbraid.org/swa# .
@Prefix tosh: http://topbraid.org/tosh# .
@Prefix xsd: http://www.w3.org/2001/XMLSchema# .

[ rdf:type sh:ValidationReport ;
sh:conforms false ;
sh:result [ rdf:type sh:ValidationResult ;
sh:focusNode https://openalex.org/W2738724892 ;
sh:resultMessage "Value must be a valid literal of type date e.g. ('YYYY-MM-DD')" ;
sh:resultPath schema1:dateCreated ;
sh:resultSeverity sh:Violation ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape [] ;
sh:value "2022-07-08T06:48:22.159262"^^xsd:date
]
] .

datetime.ttl

<https://openalex.org/W2738724892> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/CreativeWork>.
<https://openalex.org/W2738724892> <http://schema.org/dateCreated> "2022-07-08T06:48:22.159262"^^<http://www.w3.org/2001/XMLSchema#date>

creativework.ttl

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema1: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://rescs.org/dash/creativework/CreativeWorkShape> a sh:NodeShape ;
    rdfs:label "CreativeWork"^^xsd:string ;
    rdfs:comment "The most generic kind of creative work, including books, movies, photographs, software programs, etc."^^xsd:string ;
    sh:property [ sh:datatype xsd:date ;
            sh:description "The date on which the CreativeWork was created or the item was added to a DataFeed." ;
            sh:maxCount 1 ;
            sh:name "dateCreated" ;
            sh:path schema1:dateCreated ] ;
    sh:targetClass schema1:CreativeWork .

There are two things I can see in the output:

  1. a warning about an invalid xsd:date (parsing)
  2. a SHACL error for an invalid xsd:date

Shouldn't pyshacl also report an error for this case? Or this this related to rdflib which should throw a warning for an invalid xsd:date?

Please let me know if I should provide more information about my use case. Thanks!

@ashleysommer
Copy link
Collaborator

Hi @tobiasschweizer

Sorry for the delayed response on this one.

This problem is coming from RDFLib. PySHACL uses the RDFLib library to check whether the Literal's lexical text matches its given datatype.

Note, there was some work done in this area in the lead up to the RDFLib v6.2.0 release, so the new version may have some changes that help with this issue.

Additionally, RDFLib v6.2.0 gives the ability for a Literal to be flagged as "ill-typed", that is, when a Literal's given lexical text does not match its given data type, it is flagged as "ill-typed", and PySHACL can now use this value to help complete the validation checks in the sh:datatype constraint.

There will be a new version of PySHACL out later today, (pyshacl v0.20.0), that uses RDFLib v6.2.0 by default, and takes advantage of this new "ill-typed" Literals feature, so please try that and let me know if it solves your issue.

@tobiasschweizer
Copy link
Author

Hi @ashleysommer

No worries, I was on a long holiday in August and did not do anything with RDF for a while ;-)

Thanks for the heads-up. I will gladly try the new pyshacl version and let you know about the outcome.

@ashleysommer
Copy link
Collaborator

Sorry, didn't mean to automatically close this

@ashleysommer ashleysommer reopened this Sep 8, 2022
@tobiasschweizer
Copy link
Author

tobiasschweizer commented Sep 8, 2022

I've just installed pyshacl 0.20.0 and pip automatically updated rdflib to "6.2.0".
However, "2022-07-08T06:48:22.159262" is still regarded a valid xsd:date.

@ashleysommer
Copy link
Collaborator

Thanks. I'll forward that up to the RDFLib team, the fix will lie with them now.

@tobiasschweizer
Copy link
Author

Hi @ashleysommer ,

I've recently updated rdflib to 6.3.1 and I am now getting

in parse_date
raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: ...

So it seems that rdflib performs some actual checking of dates now which is great :-).

@tobiasschweizer
Copy link
Author

tobiasschweizer commented Sep 6, 2023

I figured that rdflib delegates the date literal parsing to isodate's parse_date: https://github.com/gweis/isodate/blob/8856fdf0e46c7bca00229faa1aae6b7e8ad6e76c/src/isodate/isodates.py#L118

What I found a bit surprising is that rdflib automatically adds day precision to dates with year and month precision. This behaviour comes from isodate:

For incomplete dates, this method chooses the first day for it. For
instance if only a century is given, this method returns the 1st of
January in year 1 of this century.

https://github.com/gweis/isodate/blob/8856fdf0e46c7bca00229faa1aae6b7e8ad6e76c/src/isodate/isodates.py#L126C1-L128C39

So this means that "2016"^^xsd:date in the original data is going to be a "2016-01-01"^^xsd:date when being validated.

@ashleysommer
Copy link
Collaborator

This behaviour comes from isodate

So this means that "2016"^^xsd:date in the original data is going to be a "2016-01-01"^^xsd:date when being validated.

Yeah, I've seen this issue come up before (in Python, outside of the RDF world). I think we would see this same issue with whichever datetime library RDFLib uses. This level of detail in RDF spec seems to be very implementation-specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants