Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when using FROM statements #2670

Open
jcbiddle opened this issue Dec 19, 2023 · 4 comments · May be fixed by #2794
Open

Bug when using FROM statements #2670

jcbiddle opened this issue Dec 19, 2023 · 4 comments · May be fixed by #2794

Comments

@jcbiddle
Copy link

Using a FROM statement to query a Dataset induces a number of problems. Consider the following example:

from rdflib import Dataset
from rdflib.plugins import sparql

sparql.SPARQL_LOAD_GRAPHS = False  # Needed to ensure remote lookups aren't performed
sparql.SPARQL_DEFAULT_GRAPH_UNION = False  # Needed otherwise queries don't work at all!

data = """\
@prefix ex: <http://example.com#> .
ex:Graph1 {
ex:Alice a ex:Person .
}

ex:Graph2 {
ex:Charlie a ex:Person .
}
"""

query = """
PREFIX ex: <http://example.com#>
SELECT ?person
FROM ex:Graph1
WHERE {
?person a ex:Person
}
"""

ds = Dataset()
ds.parse(data=data, format="trig")

for row in ds.query(query):
    print(*row)
# Correctly outputs "http://example.com#Alice"

This snippet loads two named graphs, each with a single triple, then queries ex:Graph1.

After this query, the graph now contains a duplicate of ex:Graph1 in the default graph:

for quad in ds.quads():
    print([term.fragment for term in quad[0:3]], quad[3])
# Outputs:
# ['Charlie', 'type', 'Person'] http://example.com#Graph2
# ['Alice', 'type', 'Person'] http://example.com#Graph1
# ['Alice', 'type', 'Person'] urn:x-rdflib:default

Aside from inadvertently increasing the size of the Dataset, this also induces a bug when querying other graphs in the Dataset. For example, if we now query ex:Graph2, we find that we get an erroneous result:

query = """
PREFIX ex: <http://example.com#>
SELECT ?person
FROM ex:Graph2
WHERE {
?person a ex:Person
}
"""
res = ds.query(query)
print("Second query results")
for row in res:
    print(*row)
# Outputs
# http://example.com#Alice <---- SHOULDN'T BE HERE
# http://example.com#Charlie

I have no clue as to the cause of this behaviour, but clearly something is wrong with the handling of FROM statements. Furthermore, none of the queries show here work if sparql.SPARQL_DEFAULT_GRAPH_UNION = True, which appears to be a separate but related problem.

@WhiteGobo
Copy link
Contributor

related discussion #2591

@jcbiddle
Copy link
Author

I believe I've identified part of the problem. In evaluate.py, the evalQuery method contains unreachable code that is intended to generate a new graph to copy the dataset selection into

firstDefault = False
for d in main.datasetClause:
    if d.default:
        if firstDefault: #  <-- This is never True
            # replace current default graph
            dg = ctx.dataset.get_context(BNode())
            ctx = ctx.pushGraph(dg)
            firstDefault = True

        ctx.load(d.default, default=True)

    elif d.named:
        g = d.named
        ctx.load(g, default=False)

Replacing if firstDefault: with if not firstDefault: appears to remedy the querying issue, as it changes the query's default context to point to a new graph with a blank node identifier that contains copies of the named graphs specified by the FROM statement. However, this new graph is never deleted, which seems to be undesirable behaviour.

@apicouSP
Copy link

apicouSP commented May 28, 2024

I made a fix here for another issue. But it seems that it also fixes yours. Could you confirm?
Also, in the fix, you don't need:

sparql.SPARQL_LOAD_GRAPHS = False 
sparql.SPARQL_DEFAULT_GRAPH_UNION = False

Since you define your dataset explicitly with FROM clause, SPARQL_DEFAULT_GRAPH_UNION is ignored.
And I only lookup for external graph if the graph URI is not found in your ConjunctiveGraph so if all your FROM clause's graph exist SPARQL_LOAD_GRAPHS is also ignored

@jcbiddle
Copy link
Author

@apicouSP Your fix is excellent, thanks for putting that together. I can't see any issues with your solution, hopefully it can be merged into the next release.

@apicouSP apicouSP linked a pull request May 29, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants