Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Innoculation doesn't bring instances into the datagraph #187

Open
jdogburck opened this issue Jun 9, 2023 · 11 comments
Open

Innoculation doesn't bring instances into the datagraph #187

jdogburck opened this issue Jun 9, 2023 · 11 comments

Comments

@jdogburck
Copy link

I have a use case where the datagraph to validate references instances of entities contained in the ontology. It doesn't appear that inoculation brings in the instances. With the current behavior , when SHACL validation is done on the datagraph has relationships to instances in the ontology, validation fails. It isn't possible to validate the class or other properties of the targets of predicate.

Given the shacl rules:

@prefix validation: <http://ontology.validation/> .
@prefix xyz: <https://ontology.xyz/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

validation:ReportShape
  a sh:NodeShape ;
  sh:targetClass xyz:Report ;
  sh:property [
    sh:path xyz:usesStyle ;
    sh:class xyz:Style ;
    sh:message "Every Report 'usesStyle' must be a Style." ;
  ] ;
  sh:closed false .

validation:usesStyleRangeConstraintsShape
  a sh:NodeShape ;
  sh:targetObjectsOf xyz:usesStyle ;
  sh:class xyz:Style; # Measurement Scale
  sh:message "Range of 'usesStyle' must be type Style." ;
  sh:closed false .

And ontology:

@prefix xyz: <https://ontology.xyz/> .

xyz:Style1 a xyz:Style .

And dataset:

@prefix xyz: <https://ontology.xyz/> .

<https://abc/some-report>
    a xyz:Report ;
    xyz:usesStyle xyz:Style1
    .

# check with: pyshacl -s shacl.ttl -e ontology.ttl datagraph.ttl

# Uncomment the following and everything works because the instances are now in the datagraph.
#
# xyz:Style1 a xyz:Style .

The following: pyshacl -s shacl.ttl -e ontology.ttl datagraph.ttl

Fails with:

Validation Report
Conforms: False
Results (2):
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
        Severity: sh:Violation
        Source Shape: [ sh:class xyz:Style ; sh:message Literal("Every Report 'usesStyle' must be a Style.") ; sh:path xyz:usesStyle ]
        Focus Node: <https://abc/some-report>
        Value Node: xyz:Style1
        Result Path: xyz:usesStyle
        Message: Every Report 'usesStyle' must be a Style.
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
        Severity: sh:Violation
        Source Shape: validation:usesStyleRangeConstraintsShape
        Focus Node: xyz:Style1
        Value Node: xyz:Style1
        Message: Range of 'usesStyle' must be type Style.

innoculation-skips-instances.zip

@ajnelson-nist
Copy link
Contributor

You might be interested in where this thread discusses owl:NamedIndividual.

@ashleysommer
Copy link
Collaborator

ashleysommer commented Jun 10, 2023

Hi @jdogburck
Alex is right, if you want an instance from your ontology to appear in the Datagraph after inoculation, you must tag it as class owl:NamedIndividual in your ontology.

@jdogburck
Copy link
Author

hmmm - well then it seems for the near term the only way around this problem for in the near term for non OWL-2 Compliant ontologies will be to include them in the graph to be validated if we want to upgrade to the latest version. thank you for the quick response.

p.s. forgive the possible incorrect use of the OWL-2 Compliant moniker as I'm not an Ontologist :)

@ajnelson-nist
Copy link
Contributor

I tend to use "Conformant" rather than "Compliant" when I'm discussing matters of syntax. For instance, with regards to owl:NamedIndividual, a graph's usage of owl:NamedIndividual is conformant to the syntactic requirements of OWL 2 DL in RDF, defined in this document, if and only if every instance of owl:NamedIndividual is not a blank node. This can be seen by reviewing all the occurrences of "NamedIndividual" in that document, and seeing that whenever there is a Turtle snippet, they take this form:

*:x rdf:type owl:NamedIndividual .

rather than this form:

x rdf:type owl:NamedIndividual .

That asterisk-vs-no-asterisk illustrative syntax is defined in Section 1.

And if this seems like its subtleness is really deep in the weeds and difficult to figure out - yes, I would agree. It's why I've been working with my ontology community to get each of these OWL nuances encoded in SHACL, because errors reported by some ontology tools have been hit or miss in assisting with correcting, or even locating, the OWL syntax bug.

For better or worse, the OWL shapes from my community resulted in the discussion over on #170, which resulted in some additional effects in pySHACL that might or might not be appropriate for users in general. For instance, this owl:NamedIndividual interaction is kind of a second-generation effect of trying to address some unspecified behaviors of SHACL with inoculation. If this conversation ends up starting again, here or in a later Issue from someone else ("Hey, why isn't my non-Class thing in my ontology being mixed in?"), I'd be inclined to suggest it's an unsatisfactory user-experience result and worth reverting.

@jdogburck
Copy link
Author

thanks for the insight 👍 and godspeed

@ashleysommer
Copy link
Collaborator

ashleysommer commented Jun 12, 2023

If this conversation ends up starting again, here or in a later Issue from someone else ("Hey, why isn't my non-Class thing in my ontology being mixed in?"), I'd be inclined to suggest it's an unsatisfactory user-experience result and worth reverting.

I definitely understand your concerns and the direction you are coming from. Unfortunately this is not a mater of "how should PySHACL work with non-OWL-2-compliant ontologies", but a symptom of the broader issue of "How should SHACL validation engines incorporate ontologies?". According to the W3C SHACL Spec, the answer is they don't. It is stated in the spec that it is expected that the user prepares the datagraph with all ontological definitions required to support successful validation, before passing the datagraph to the validator. If you are using any other SHACL validator, that is a manual pre-processing step you need to undertake in your workflow to prepare your datagraph before validation.

PySHACL added the ontology "mix-in" feature (that later evolved into the "inoculation" feature) a non-Spec addition for the cases where it was not possible or not practical to modify the datagraph ahead of time. As this is not a feature found in other validators, there is no defined "correct" way to implement that. The previous way copied everything from the ontology into the datagraph, and new way only copies RDFS and OWL features from the ontology into the datagraph.

The old way was a "quick fix solution" implemented to solve a problem a colleague was facing and then stayed around for several years. It often polluted datagraphs with thousands of extra unnecessary triples that slowed down validation, and caused issues for users whose SHACL Shape graph and ontology graph were the same, because then you would get SHACL Shapes and constraints in your datagraph, which is unnecessary and can complicate validation.

The new way is how I imagined the feature should be from the start, and should have behaved that way all along. I have feedback that the improved validator performance is noticeable and appreciated, and for the majority of users the validation results are unchanged.

For the cases when the new inoculation behaviour does not copy over what you expect it to copy, I suggest you look to what other validation engines do. That is, implement a pre-processing step in your workflow to add in any extra instances to the datagraph that are required for validation before sending it to the validator. You could even make it a combination of both approaches, by adding non-OWL definitions and instances yourself in a pre-processing step, and letting PySHACL add all of the OWL content automatically with inoculation.

@MiltosD
Copy link

MiltosD commented Sep 30, 2023

Hi @jdogburck Alex is right, if you want an instance from your ontology to appear in the Datagraph after inoculation, you must tag it as class owl:NamedIndividual in your ontology.

This is not very good, if you want to inoculate well known ontologies that do not have this tag... e.g. https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F83cbd12f-5c6c-11ee-9220-01aa75ed71a1.0001.05%2FDOC_1&fileName=languages-skos.rdf

@MiltosD
Copy link

MiltosD commented Sep 30, 2023

inoculation also drains the memory, if you need to include a bunch of them

@ajnelson-nist
Copy link
Contributor

@MiltosD , @ashleysommer :

I've since come to think that owl:NamedIndividual might be being overly leaned-on in the inoculation process.

Apologies that I don't have a specific English paragraph citation handy, but I found reviewing OWL 2 Web Ontology Language Mapping to RDF Graphs, that owl:NamedIndividual appears to be a syntactic descriptor. To be conformant with OWL-2 DL, a member of the class owl:NamedIndividual cannot be an RDF blank node, and must instead be an IRI-identified node. That is, this SHACL restriction seems to fully review owl:NamedIndividual:

owl:NamedIndividual-shape
    a sh:NodeShape ;
    sh:nodeKind sh:IRI ;
    sh:targetClass owl:NamedIndividual ;
    .

There might have been some deeper meaning in OWL 1, but with OWL 2's introduction of punning that permits owl:NamedIndividual and owl:Class, it might be down to just syntax.

I did not appreciate this nuance, or possibly semantic limit, when initially discussing how to handle access to Individuals necessary for, and embedded in, the "TBox" ontology side.

I'm open to revisiting this inoculation-aid strategy.

@ajnelson-nist
Copy link
Contributor

Oh dear, I apologize for re-explaining the syntax thing, I forgot to check whether I'd done so in this thread, and I had.

But, again, I'm open to revisiting.

@MiltosD
Copy link

MiltosD commented Sep 30, 2023

@ajnelson-nist I've been using several skos:Concept rdf files. They do not tag owl:NamedIndividual anywhere, which makes them useless for my shacl rules, when using pyShacl. I had tried Apache Jena before and did not have a problem there, however I need to include a shacl validation funcrionality to my python project, so...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants