Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrd query returning incorrect "numFound" #108

Open
chetrebman opened this issue Aug 31, 2015 · 3 comments
Open

Hybrd query returning incorrect "numFound" #108

chetrebman opened this issue Aug 31, 2015 · 3 comments
Assignees
Labels

Comments

@chetrebman
Copy link

THE DATA

http://coalliance.org/id/1 http://coalliance.org/siteCode/site_A "MK_1".
http://coalliance.org/id/2 http://coalliance.org/siteCode/site_A "MK_2".
http://coalliance.org/id/3 http://coalliance.org/siteCode/site_A "MK_3".

http://coalliance.org/id/6 http://coalliance.org/siteCode/site_B "MK_1".
http://coalliance.org/id/4 http://coalliance.org/siteCode/site_B "MK_4".
http://coalliance.org/id/5 http://coalliance.org/siteCode/site_B "MK_5".

THE QUERY

SELECT ?o
WHERE {
?s http://coalliance.org/siteCode/site_A ?o.
?s2 http://coalliance.org/siteCode/site_B ?o
}

THE SOLR RESPONSE

<result name="response" numFound="4" start="0" maxScore="1.0">
<head>
<variable name="o"/>
</head>
<results>
<result>
<binding name="o">
<literal>MK_1</literal>
</binding>
</result>
</results>
```

@chetrebman
Copy link
Author

Here is another query with possibly the same issue
SELECT ?object
WHERE {
?subject http://coalliance.org/siteCode/site_A ?object.
MINUS {?subject2 http://coalliance.org/siteCode/site_B ?object}
}

`

MK_2 MK_3 `

@agazzarini agazzarini self-assigned this Sep 1, 2015
@agazzarini
Copy link
Member

Hi @chetrebman sorry for the long absence...I can confirm you: that is definitely a bug, and the bad news is that is not a trivial task...it is strictly related with the issue #96, in other words it has something to do with a Solr-optimized implementation of the SPARQL plan execution.

Technically: the numFound attribute, as you described, is reporting a wrong number because it is doing a kind of union with all docsets resulting from the query evaluation. The query evaluation consists of several steps, that are translated in several Solr queries. The current implementation provides just the primitives for working with triples (i.e. add, remove and query); on top of a SPARQL query, each time an underlying (Solr) query is executed (as result of the execution of some part of the algebra plan) the resulting docset (the set of matching documents identifiers) is collected, adding them to the previous docset. This kind of "collection" operation (i.e. the union) is not valid in general because sometimes the incoming docset should replace the previous one, sometimes an intersection has to be done, sometimes a union is the right thing to do. Unfortunately the current implementation cannot know what kind of "collection" operation needs to be done...and (wrongly) executes a union.

In other words, the number "4" in your example means that in order to execute that SPARQL query, the processor executed n queries and worked with a total number of (matching) 4 documents. So while this is a right measure, it is unuseful, as the number you (and me) would like to see is the total count of outcoming query solutions.

I'm still fighting with issue #96 and thinking about how to end up with this.

Andrea

@agazzarini
Copy link
Member

agazzarini commented May 2, 2016

Following the same thread, I paste the exchange with another user.

"Hi,
How to add Facet option in curl query, with example of bsbm-generated-dataset.nt, I tried
curl "http://127.0.0.1:8080/solr/store/sparql" --data-urlencode "q=SELECT ?product ?label WHERE { ?product ?p ?label.} ORDER BY ?label LIMIT 10 &facet=true&facet.field=product" -H "Accept: application/sparql-results+json"
But it did not work."

"A first premise: as you can read here [2] what I called "Hybrid mode" has been temporaily disabled in the current version of SolRDF so everything below is related to SolRDF 1.0 (which is in a dedicated branch and runs on top of Solr 4.x)

A second premise: it's not your case (read below) but faceting is not working on SolRDF mainly because the issue related with the SPARQL algebra (I don't remember exactly the number).

Having said that, "it's not your case" because your SPARQL query contains just one triple pattern, and in this (only) case you can get (using SolRDF 1.0) some facets back from SolRDF. However, seeing your example, things are not working as you might expect: you don't have a "product" field but just s(ubject), p(redicate) and o(bject) fields, so for a plain field faceting you can only use one of them.

I suggest you to have a look at my first post about SPOC faceting[1] and the SolRDF Wiki [2] as well. There, especially in the wiki, you can find the several kinds of faceting that "should" be available and how they "should" work, with examples and command lines. "Should" means remember: at the moment things are working only if you have one simple triple pattern in your SPARQL."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants