TracedSPARQL

TracedSPARQL is tracing SHACL validations during SPARQL query processing towards a better understanding of SPARQL query results. A brief explanation of TracedSPARQL is available in the directory doc.

Preparation of the Environment

Machine Requirements

OS: Ubuntu 16.04.6 LTS or newer
Memory: 128 GiB
HDD: approx. 50 GiB free disk space

Software

Docker - v19.03.6 or newer
docker-compose - v1.26.0 or newer

Bash Commands

The experiment scripts use the following bash commands:

basename
cd
chown
declare (with options -a and -A)
echo
logname
rm
sleep
source
unzip
wget

Experiments

Research Questions

What is the overhead of adding online SHACL validation to the SPARQL query processing?
Do the proposed optimizations increase the performance?
Which heuristic has the highest single effect?

Data & SHACL Shape Schemas

Data from three benchmarks are used in the evaluation of TracedSPARQL. The following benchmarks are covered:

Lehigh University Benchmark (LUBM) [1]
Waterloo SPARQL Diversity Test Suite (WatDiv) [2]
DBpedia [3]

For LUBM and WatDiv knowledge graphs of three different sizes are used. Hence, a total of seven knowledge graphs are evaluated. For LUBM and WatDiv two SHACL shape schemas of different complexity are validated. In the case of DBpedia, a single SHACL shape schema is used. 10 SPARQL queries from the LUBM benchmark are included in the evaluation. From WatDiv, 18 SPARQL queries are considered. 20 SPARQL queries are created for the evaluation of DBpedia. The SPARQL queries cover at least one SHACL shape schema of the respective benchmark. All data used are made public [4].

Engines

TracedSPARQL is compared with a naive approach, referred to as baseline. The federated SPARQL query engine used is DeTrusty [5]. The SHACL validation is performed by Trav-SHACL [6] and SHACL2SPARQLpy [7], a Python implementation of SHACL2SPARQL [8]. This leads to the following engines included in the evaluation:

Name	SHACL Validator	Heuristics
Baseline	Trav-SHACL	none
Baseline S2S	SHACL2SPARQLpy	none
TracedSPARQL	Trav-SHACL	all
TracedSPARQL S2S	SHACL2SPARQLpy	all

Setups

The combination of a knowledge graph, engine, SHACL shape schema, and SPARQL query is referred to as a testbed; this leads to a total of 1,065 testbeds. Each testbed is executed five times. Caches are flushed between the execution of two consecutive testbeds.

How to reproduce?

In order to facilitate the reproduction of the results, all components are encapsulated in Docker containers and the experiments are controlled via Shell scripts. You can run the entire pipeline by executing:

sudo ./00_auto.sh

In the following, the different scripts are described in short.

00_auto.sh: Executes the entire experiment automatically
01_preparation.sh: Prepares the experimental environment, i.e., downloads the data and sets up the Docker containers
02_experiments_lubm.sh: Executes the experiments for LUBM
03_experiments_watdiv.sh: Executes the experiments for WatDiv
04_experiments_dbpedia.sh: Executes the experiments for DBpedia
05_ablation_study.sh: Executes the ablation study
06_plots.sh: Creates the plots presented in the paper
07_cleanup.sh: Cleans up the experimental environment including changing the ownership of result files to the user executing the script
run_testbeds.sh: Contains functions for performing the experiments
variables.sh: Contains variables used for performing the experiments

Results

The result plots included in the paper and a brief summary is available in the results directory.

License

TracedSPARQL is licensed under GPL-3.0, see the license.

References

[1] Y. Guo, Z. Pan, J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3(2-3), 158-182 (2005). DOI: 10.1016/j.websem.2005.06.005

[2] G. Aluç, O. Hartig, M.T. Özsu, K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In: The Semantic Web -- ISWC 2014, Springer, Cham, 2014. DOI: 10.1007/978-3-319-11964-9_13

[3] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives. DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, Springer, Berlin, Heidelberg, 2007. DOI: 10.1007/978-3-540-76298-0_52

[4] P.D. Rohde, M.-E. Vidal. Dataset: TracedSPARQL Benchmarks. Leibniz Data Manager (2023). DOI: 10.57702/wfl730bc

[5] P.D. Rohde, M. Bechara, Avellino. DeTrusty v0.15.0. Zenodo (2023). DOI: https://doi.org/10.5281/zenodo.10245898.

[6] M. Figuera, P.D. Rohde, M.-E. Vidal. Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. In: The Web Conference, ACM, New York, NY, USA, 2021. DOI: 10.1145/3442381.3449877.

[7] M. Figuera, P.D. Rohde. SHACL2SPARQLpy v1.3.0. GitHub (2023). URL: https://github.com/SDM-TIB/SHACL2SPARQLpy

[8] J. Corman, F. Florenzano, J.L. Reutter, O. Savković. SHACL2SPARQL: Validating a SPARQL Endpoint against Recursive SHACL Constraints. In: Proceedings of the ISWC 2019 Satellite Tracks, CEUR-WS, Aachen, Germany, 2019. URL: https://ceur-ws.org/Vol-2456/paper43.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
analysis		analysis
api_configs		api_configs
code		code
datasources		datasources
doc		doc
queries		queries
results		results
shapes		shapes
.gitignore		.gitignore
00_auto.sh		00_auto.sh
01_preparation.sh		01_preparation.sh
02_experiments_lubm.sh		02_experiments_lubm.sh
03_experiments_watdiv.sh		03_experiments_watdiv.sh
04_experiments_dbpedia.sh		04_experiments_dbpedia.sh
05_ablation_study.sh		05_ablation_study.sh
06_plots.sh		06_plots.sh
07_cleanup.sh		07_cleanup.sh
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
run_testbeds.sh		run_testbeds.sh
variables.sh		variables.sh

License

SDM-TIB/TracedSPARQL

Folders and files

Latest commit

History

Repository files navigation

TracedSPARQL

Table of Contents

Preparation of the Environment

Machine Requirements

Software

Bash Commands

Experiments

Research Questions

Data & SHACL Shape Schemas

Engines

Setups

How to reproduce?

Results

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages