Bug when using FROM statements #2670

jcbiddle · 2023-12-19T03:19:03Z

Using a FROM statement to query a Dataset induces a number of problems. Consider the following example:

from rdflib import Dataset
from rdflib.plugins import sparql

sparql.SPARQL_LOAD_GRAPHS = False  # Needed to ensure remote lookups aren't performed
sparql.SPARQL_DEFAULT_GRAPH_UNION = False  # Needed otherwise queries don't work at all!

data = """\
@prefix ex: <http://example.com#> .
ex:Graph1 {
ex:Alice a ex:Person .
}

ex:Graph2 {
ex:Charlie a ex:Person .
}
"""

query = """
PREFIX ex: <http://example.com#>
SELECT ?person
FROM ex:Graph1
WHERE {
?person a ex:Person
}
"""

ds = Dataset()
ds.parse(data=data, format="trig")

for row in ds.query(query):
    print(*row)
# Correctly outputs "http://example.com#Alice"

This snippet loads two named graphs, each with a single triple, then queries ex:Graph1.

After this query, the graph now contains a duplicate of ex:Graph1 in the default graph:

for quad in ds.quads():
    print([term.fragment for term in quad[0:3]], quad[3])
# Outputs:
# ['Charlie', 'type', 'Person'] http://example.com#Graph2
# ['Alice', 'type', 'Person'] http://example.com#Graph1
# ['Alice', 'type', 'Person'] urn:x-rdflib:default

Aside from inadvertently increasing the size of the Dataset, this also induces a bug when querying other graphs in the Dataset. For example, if we now query ex:Graph2, we find that we get an erroneous result:

query = """
PREFIX ex: <http://example.com#>
SELECT ?person
FROM ex:Graph2
WHERE {
?person a ex:Person
}
"""
res = ds.query(query)
print("Second query results")
for row in res:
    print(*row)
# Outputs
# http://example.com#Alice <---- SHOULDN'T BE HERE
# http://example.com#Charlie

I have no clue as to the cause of this behaviour, but clearly something is wrong with the handling of FROM statements. Furthermore, none of the queries show here work if sparql.SPARQL_DEFAULT_GRAPH_UNION = True, which appears to be a separate but related problem.

The text was updated successfully, but these errors were encountered:

WhiteGobo · 2024-01-11T18:18:44Z

related discussion #2591

jcbiddle · 2024-01-12T04:29:56Z

I believe I've identified part of the problem. In evaluate.py, the evalQuery method contains unreachable code that is intended to generate a new graph to copy the dataset selection into

firstDefault = False
for d in main.datasetClause:
    if d.default:
        if firstDefault: #  <-- This is never True
            # replace current default graph
            dg = ctx.dataset.get_context(BNode())
            ctx = ctx.pushGraph(dg)
            firstDefault = True

        ctx.load(d.default, default=True)

    elif d.named:
        g = d.named
        ctx.load(g, default=False)

Replacing if firstDefault: with if not firstDefault: appears to remedy the querying issue, as it changes the query's default context to point to a new graph with a blank node identifier that contains copies of the named graphs specified by the FROM statement. However, this new graph is never deleted, which seems to be undesirable behaviour.

apicouSP · 2024-05-28T15:31:56Z

I made a fix here for another issue. But it seems that it also fixes yours. Could you confirm?
Also, in the fix, you don't need:

sparql.SPARQL_LOAD_GRAPHS = False 
sparql.SPARQL_DEFAULT_GRAPH_UNION = False

Since you define your dataset explicitly with FROM clause, SPARQL_DEFAULT_GRAPH_UNION is ignored.
And I only lookup for external graph if the graph URI is not found in your ConjunctiveGraph so if all your FROM clause's graph exist SPARQL_LOAD_GRAPHS is also ignored

jcbiddle · 2024-05-29T02:29:21Z

@apicouSP Your fix is excellent, thanks for putting that together. I can't see any issues with your solution, hopefully it can be merged into the next release.

apicouSP linked a pull request May 29, 2024 that will close this issue

Fix explicit dataset (FROM and FROM NAMED clauses) #2794

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when using FROM statements #2670

Bug when using FROM statements #2670

jcbiddle commented Dec 19, 2023

WhiteGobo commented Jan 11, 2024

jcbiddle commented Jan 12, 2024

apicouSP commented May 28, 2024 •

edited

jcbiddle commented May 29, 2024

Bug when using FROM statements #2670

Bug when using FROM statements #2670

Comments

jcbiddle commented Dec 19, 2023

WhiteGobo commented Jan 11, 2024

jcbiddle commented Jan 12, 2024

apicouSP commented May 28, 2024 • edited

jcbiddle commented May 29, 2024

apicouSP commented May 28, 2024 •

edited