Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Similarity Calculation in Distance Module #133

Open
remiceres opened this issue Jul 26, 2023 · 8 comments
Open

Inconsistent Similarity Calculation in Distance Module #133

remiceres opened this issue Jul 26, 2023 · 8 comments
Labels

Comments

@remiceres
Copy link
Collaborator

Issue Description:

A bug has been identified in the distance module of the software. This issue has been reported by user Ali Ballout.

Bug Details:

The problem arises when dealing with classes that are considered equivalent and have subclass relations between each other. For instance, let's consider the classes http://dbpedia.org/ontology/Constellation and http://www.wikidata.org/entity/Q8928.

The software Corese assigns depths to these classes based on their order of appearance. As a result, one of the classes might receive a depth of 2, while the other gets a depth of 3. The issue manifests during the calculation of distance and similarity between these classes.

Steps to Reproduce:

  1. Load the file mainDbpedia.owl into the application.
  2. Perform a similarity calculation between the classes http://dbpedia.org/ontology/Constellation and http://www.wikidata.org/entity/Q8928.

Expected Behavior:

The similarity calculation should be consistent regardless of the order of the classes.

Actual Behavior:

The similarity calculation yields different results depending on the position of the classes. For instance:

In this case, it seems that one of the classes is erroneously used as a common ancestor because it is found in the list of ancestors of the other class and has a higher depth.

However, in this scenario, no common ancestor is being used, as both classes are listed as each other's parents in the list of ancestors.

Note to Developers:

This inconsistency in similarity calculation based on class order and subclass relations could lead to incorrect results and impacts the accuracy of the distance module. Further investigation and debugging are required to resolve the issue.

Screenshots/Attachments:

log

mainDbpedia.zip

@FabienGandon
Copy link
Collaborator

This is based on my PhD see section "12.2.3 The subsumption link is not a unitary length" page 371 of https://theses.hal.science/tel-00378201

Happy to have a work session on that. The first thing to do is to draw the sub hierarchy on which you are testing with all its paths.

@remiceres
Copy link
Collaborator Author

Here is the schema of the sub-hierarchy on which the test is performed. I use this query to obtain the image:

CONSTRUCT {
  <http://www.wikidata.org/entity/Q8928> rdfs:subClassOf ?parent .
  ?parent rdfs:subClassOf ?ancestor .
  <http://www.wikidata.org/entity/Q8928> owl:equivalentClass ?equivalentClass .
  <http://www.wikidata.org/entity/Q8928> owl:sameAs ?sameAs .
}
WHERE {
  {
    <http://www.wikidata.org/entity/Q8928> rdfs:subClassOf ?parent .
    OPTIONAL {
      ?parent rdfs:subClassOf* ?ancestor .
      FILTER (isIRI(?ancestor) && ?ancestor != <http://www.wikidata.org/entity/Q8928>)
    }
  }
  OPTIONAL {
    <http://www.wikidata.org/entity/Q8928> owl:equivalentClass ?equivalentClass .
  }
  OPTIONAL {
    <http://www.wikidata.org/entity/Q8928> owl:sameAs ?sameAs .
  }
}

Screenshot from 2023-08-01 18-00-35

@FabienGandon
Copy link
Collaborator

FabienGandon commented Aug 2, 2023 via email

@ali-ballout
Copy link

it's not really a hierarchy since ns2:Constellation and ns1:Q8928 are set as equivalent i.e. they are the same class so the path between them is 0 then we have just one path owl:Thing < ns2:CelestialBody < ( ns2:Constellation | ns1:Q8928) typically owl:Thing < ns2:CelestialBody should be 1/2 and ns2:CelestialBody < ( ns2:Constellation | ns1:Q8928) should be 1/4

Dr Fabien you are perfectly right and thats why I reported the bug. In the case where 2 entities are equivalent the similarity calculation is inconsistent. The problem is not at all with your theory/algorithm rather the implementation of this edge case. I fixed it in my python implementation and reported it to Remi, it becomes a little more complex the deeper we go through those equivalent nodes. The way I dealt with it is keeping a cache of classes that are equivalent and looking it up when ever I calculate a distance or trying to find common ancestors.

@FabienGandon
Copy link
Collaborator

FabienGandon commented Aug 2, 2023 via email

@ali-ballout
Copy link

yes for sure:

expected distance (classes have an equivalence relation):
dist(http://www.wikidata.org/entity/Q8928 and http://dbpedia.org/ontology/Constellation) = 0

actual distance given by Corese:
dist(http://www.wikidata.org/entity/Q8928 and http://dbpedia.org/ontology/Constellation) = 0.12500023

@FabienGandon
Copy link
Collaborator

FabienGandon commented Aug 2, 2023 via email

@ali-ballout
Copy link

So the fix is to link all equivalent classes with rdfs:subClassOf links of length 0 in both directions NB: the current implementation was done for RDFS not for OWL

Makes perfect sense, thats exactly what I did for my package.

@remiceres remiceres added the bug label Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants