-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARQL endpoint #963
Comments
Hi @lisestork Thanks for asking about the GloBI sparql endpoint and associated data. I think you are one of the first to ask about this in the decade its been available . . . ; ) Before I dig into this, can you tell a little more about how you are planning to use the sparql endpoint? |
I have been playing the last few days with the sparql endpoint and I have a similar experience. The data seems to be incomplete and the data modelling looks as it is dumped from another data model, perhaps property graphs? In any case can you provide a sparql query that returns me all the interactions just like I see in the browser for a particular species? There are no provided examples. I tried different queries to get all the interactions for the brown bear for example, in which I get incomplete and some results are incompatible with the interactions shown in the browser feature:
or
Also the endpoint times out after 60sec, especially problematic with (owl:sameAs|^owl:sameAs)* reasoning. |
@lisestork @aahmeti thanks again for your interest. As I mentioned before, you are among the first to ask about the sparql endpoint that GloBI has had for about a decade. It'd be helpful if you can provide some context to how you are planning the use the sparql endpoint. Also, please let me know if you are willing to contribute to possible modeling tweaks to the rdf/nquad versions of the GloBI interaction data. The model implemented today hasn't been touched for quite a while and could probably use some TLC. Thanks for being patient and for sharing your concern. |
@jhpoelen thanks for the quick reply. From my side, as I also tried to explain with my queries, the endpoint needs to be able to answer the simple of question of "give me all the interactions for a particular species?" and return the answers we see in the "browser" mode. Thanks! Edit: I'd be curious if you have had any SPARQL queries that you can share with us. |
@aahmeti thanks for sharing your desires and questions. First, see https://github.com/globalbioticinteractions/globalbioticinteractions/wiki#accessing-species-interaction-data for some documentation about various access methods. Also, you can find the triples loaded into the triple store in the resource organism A classified as taxon X where "interacts with" can be OBO Relation Ontology terms. Just curious - Why not just use the GloBI Rest API instead ? This is used by the "brower" mode. What is your particular reason for using SPARQL? Thanks for being patient for me as I am trying to understand your data access constraints. Curious to hear your thoughts |
Thanks for providing the links. I had loaded the file locally in my GraphDB triple store, but that did not bring me more complete interactions. Looks like the only viable solution is via API, which is the route I am going to take. The reason why I go via SPARQL endpoint is that all my data is stored as RDF and that is what I use for data integration; with other formats CSV/JSON I need to go thru the route of data wrangling and transforming to RDF, which is an extra step. |
@aahmeti thanks for sharing your data integration methods. I can see how it'd be easier to integrate via rdf if you are already using a graphdb triple store. Please note that individual datasets have "nanopub" endpoints. These contain rdf snippets in form of nanopubs. You can find an example of these nanopubs archives via the https://globalbioticinteractions.org/datasets via the "nanopub" badge: the reaching the individual nanopub archives directly can be done via: where You can find a list of all dataset namespaces via: https://depot.globalbioticinteractions.org/snapshot/target/data/tsv/datasets.tsv or https://depot.globalbioticinteractions.org/snapshot/target/data/csv/datasets.csv E.g.,
Happy to work with you to come up with an rdf shape that would work for you, if you provide an example of the shape you'd enjoy working with, perhaps we can incorporate that in existing GloBI data products or create a new one. Again, apologies for having to deal with all the "dust" collected on the largely unused RDF perspective onto GloBI. Maybe this is an opportunity to blow some life into that aspect of GloBI. Thanks for being patient. |
Wow, this looks promising! Thank you, Sir! 😎 I downloaded one of those .trigs related to "grizzly" and with the following SPARQL query I already got 255 interactions, much more than what I had before!
So this means that if I import all those nanopub archives I will come to the complete numbers coinciding with the ones shown in the "browser", correct? Do I have still to go thru species nomenclature mapping, or I am good if I just use latin name "Ursus arctos" and get all the aggregated data at this point? I think this is the way to go, I can run a set of INSERT queries and change the data model the way I see fit now. If you want I can write a guide that after you verify and proofread it can put in the list of guides. What do you say? |
The nanopubs is using the names as provided by the data source. So, this does not include name alignment, at least not yet . . . So, if a data source provides some kind of taxon id (e.g., NCBI:9606 for Homo sapiens), the nanopubs may include it, otherwise, only the provided names are included. Note that, if you'd like, you can use some other tool like Nomer, an associated tool https://github.com/globalbioticinteractions/name-alignment-template, or your own methods to add the linkages. Also note that the "Browse" results are generated after GloBI's name alignment process, so the results may vary a little, depending on the effect of name alignment processes. For more information see https://globalbioticinteractions.org/process .
I very much like your idea, and I'd be happy to review a pull request on the "how-to" page https://github.com/globalbioticinteractions/globalbioticinteractions.github.io/blob/main/how-to.md created by @EMTuckerLab . Curious to hear what you come up with, and open to suggestions / comments / questions that you may have. |
hey @aahmeti @lisestork - I've upgraded GloBI to have more capacity, and am hoping to try to host more of the GloBI data through a sparql endpoint. If you are willing to host a full copy of the GloBI triples in one of your triples stores, please do let me know! Thanks for being patient. |
A full copy (6.6GB compressed) of the interactions.nq.gz is now available via https://depot.globalbioticinteractions.org/snapshot/target/data/interactions.nq.gz Am open to suggestions on how to host these 2.2 billion triples in a triple store of sorts. Perhaps an option would be to generate the data in a modular way, one per indexed dataset perhaps. |
The SPARQL endpoint does not seem to contain all interactions available through the API. As an example, there is information about garden tomato's (https://api.globalbioticinteractions.org/findExternalUrlForTaxon/Solanum%20lycopersicum), but there are no interactions available for the garden tomato via the SPARQL endpoint. Am I not querying correctly, or does the SPARQL endpoint contain only a subset of interactions? Moreover, there are no human-readable labels available in the SPARQL endpoint, hampering querying (since various different taxon IDs are used).
The text was updated successfully, but these errors were encountered: