Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdfdiff finds too many differences #2285

Open
justin2004 opened this issue Feb 20, 2024 · 1 comment
Open

rdfdiff finds too many differences #2285

justin2004 opened this issue Feb 20, 2024 · 1 comment
Labels
enhancement Incrementally add new feature

Comments

@justin2004
Copy link

Version

5.0.0-rc1

What happened?

rdfdiff finds more differences than there are between two files.
the case below shows two files that are isomorphic except that one has 1 additional triple but rdfdiff finds several other differences between the files.

$ cat issue.ttl 
@prefix ex: <http://example.com/> .                                                                                                            
                                   
ex:region_10
        a ex:Region ;                                                                                                                          
        ex:related [
                ex:memberList (                                                                                                                
                        ex:region_11                                                                                                           
                ) ;                                                                                                                            
        ] ;                                                                                                                                    
        .                                                                                                                                      
                                                                                           
$ cat issue1.ttl                                                                                 
@prefix ex: <http://example.com/> .                                                                                                            
                                                                                                                                               
ex:region_10                                                           
        a ex:Region ;                                                  
        ex:related [
                ex:memberList (
                        ex:region_11
                ) ;
        ] ;
        .
        
$ ~/Downloads/apache-jena-5.0.0-rc1/bin/rdfdiff issue.ttl issue1.ttl ttl ttl
models are equal        

# that was as expected
# but now when i add a single triple to one of the files:

$ cat issue1.ttl 
@prefix ex: <http://example.com/> .

ex:region_16 a ex:Region .

ex:region_10
        a ex:Region ;
        ex:related [
                ex:memberList (
                        ex:region_11
                ) ;
        ] ;
        .
$ ~/Downloads/apache-jena-5.0.0-rc1/bin/rdfdiff issue.ttl issue1.ttl ttl ttl
models are unequal

< 5 triples
> 6 triples
< [http://example.com/region_10, http://example.com/related, _:2f46e8a27dca00569d3d04d42c3f3c53]
< [_:2f46e8a27dca00569d3d04d42c3f3c53, http://example.com/memberList, _:f0c1b4df8bd882e0dc926aee91d36315]
< [_:f0c1b4df8bd882e0dc926aee91d36315, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil]
< [_:f0c1b4df8bd882e0dc926aee91d36315, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://example.com/region_11]
> [_:d310f6311ac18f96e2c81f63007497e6, http://example.com/memberList, _:fe69945f1dff1961864044d0b5a4c756]
> [http://example.com/region_16, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://example.com/Region]
> [http://example.com/region_10, http://example.com/related, _:d310f6311ac18f96e2c81f63007497e6]
> [_:fe69945f1dff1961864044d0b5a4c756, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil]
> [_:fe69945f1dff1961864044d0b5a4c756, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://example.com/region_11]

# but i only expected a single triple difference between the two files

i expected output something like this:

models are unequal

< 0 triples
> 1 triples
> [http://example.com/region_16, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://example.com/Region]

I got the same results on jena 4.10.x.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

None

@justin2004 justin2004 added the bug label Feb 20, 2024
@afs
Copy link
Member

afs commented Feb 20, 2024

See the comment in a56fa1f
Code: https://github.com/apache/jena/blob/main/jena-cmds/src/main/java/arq/rdfdiff.java

The code has not changed in quite sometime.

It's printing the two files.

FWIW I think finding a minimal difference of two unordered collections with bnode isomorphism is quite a difficult problem. Even plain text diff can find produce non-optimal difference files. Hope to be proved wrong.

@afs afs added enhancement Incrementally add new feature and removed bug labels Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Incrementally add new feature
Projects
None yet
Development

No branches or pull requests

2 participants