TracedSPARQL is tracing SHACL validations during SPARQL query processing towards a better understanding of SPARQL query results. A brief explanation of TracedSPARQL is available in the directory doc.
- OS: Ubuntu 16.04.6 LTS or newer
- Memory: 128 GiB
- HDD: approx. 50 GiB free disk space
- Docker - v19.03.6 or newer
- docker-compose - v1.26.0 or newer
The experiment scripts use the following bash commands:
- basename
- cd
- chown
- declare (with options -a and -A)
- echo
- logname
- rm
- sleep
- source
- unzip
- wget
- What is the overhead of adding online SHACL validation to the SPARQL query processing?
- Do the proposed optimizations increase the performance?
- Which heuristic has the highest single effect?
Data from three benchmarks are used in the evaluation of TracedSPARQL. The following benchmarks are covered:
- Lehigh University Benchmark (LUBM) [1]
- Waterloo SPARQL Diversity Test Suite (WatDiv) [2]
- DBpedia [3]
For LUBM and WatDiv knowledge graphs of three different sizes are used. Hence, a total of seven knowledge graphs are evaluated. For LUBM and WatDiv two SHACL shape schemas of different complexity are validated. In the case of DBpedia, a single SHACL shape schema is used. 10 SPARQL queries from the LUBM benchmark are included in the evaluation. From WatDiv, 18 SPARQL queries are considered. 20 SPARQL queries are created for the evaluation of DBpedia. The SPARQL queries cover at least one SHACL shape schema of the respective benchmark. All data used are made public [4].
TracedSPARQL is compared with a naive approach, referred to as baseline. The federated SPARQL query engine used is DeTrusty [5]. The SHACL validation is performed by Trav-SHACL [6] and SHACL2SPARQLpy [7], a Python implementation of SHACL2SPARQL [8]. This leads to the following engines included in the evaluation:
Name | SHACL Validator | Heuristics |
---|---|---|
Baseline | Trav-SHACL | none |
Baseline S2S | SHACL2SPARQLpy | none |
TracedSPARQL | Trav-SHACL | all |
TracedSPARQL S2S | SHACL2SPARQLpy | all |
The combination of a knowledge graph, engine, SHACL shape schema, and SPARQL query is referred to as a testbed; this leads to a total of 1,065 testbeds. Each testbed is executed five times. Caches are flushed between the execution of two consecutive testbeds.
In order to facilitate the reproduction of the results, all components are encapsulated in Docker containers and the experiments are controlled via Shell scripts. You can run the entire pipeline by executing:
sudo ./00_auto.sh
In the following, the different scripts are described in short.
- 00_auto.sh: Executes the entire experiment automatically
- 01_preparation.sh: Prepares the experimental environment, i.e., downloads the data and sets up the Docker containers
- 02_experiments_lubm.sh: Executes the experiments for LUBM
- 03_experiments_watdiv.sh: Executes the experiments for WatDiv
- 04_experiments_dbpedia.sh: Executes the experiments for DBpedia
- 05_ablation_study.sh: Executes the ablation study
- 06_plots.sh: Creates the plots presented in the paper
- 07_cleanup.sh: Cleans up the experimental environment including changing the ownership of result files to the user executing the script
- run_testbeds.sh: Contains functions for performing the experiments
- variables.sh: Contains variables used for performing the experiments
The result plots included in the paper and a brief summary is available in the results directory.
TracedSPARQL is licensed under GPL-3.0, see the license.
[1] Y. Guo, Z. Pan, J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3(2-3), 158-182 (2005). DOI: 10.1016/j.websem.2005.06.005
[2] G. Aluç, O. Hartig, M.T. Özsu, K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In: The Semantic Web -- ISWC 2014, Springer, Cham, 2014. DOI: 10.1007/978-3-319-11964-9_13
[3] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives. DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, Springer, Berlin, Heidelberg, 2007. DOI: 10.1007/978-3-540-76298-0_52
[4] P.D. Rohde, M.-E. Vidal. Dataset: TracedSPARQL Benchmarks. Leibniz Data Manager (2023). DOI: 10.57702/wfl730bc
[5] P.D. Rohde, M. Bechara, Avellino. DeTrusty v0.15.0. Zenodo (2023). DOI: https://doi.org/10.5281/zenodo.10245898.
[6] M. Figuera, P.D. Rohde, M.-E. Vidal. Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. In: The Web Conference, ACM, New York, NY, USA, 2021. DOI: 10.1145/3442381.3449877.
[7] M. Figuera, P.D. Rohde. SHACL2SPARQLpy v1.3.0. GitHub (2023). URL: https://github.com/SDM-TIB/SHACL2SPARQLpy
[8] J. Corman, F. Florenzano, J.L. Reutter, O. Savković. SHACL2SPARQL: Validating a SPARQL Endpoint against Recursive SHACL Constraints. In: Proceedings of the ISWC 2019 Satellite Tracks, CEUR-WS, Aachen, Germany, 2019. URL: https://ceur-ws.org/Vol-2456/paper43.pdf