Perfomance comparison Oxigraph vs. QLever #841

hannahbast · 2024-03-31T23:43:22Z

hannahbast
Mar 31, 2024

Dear Thomas,

I finally found the time to play around with Oxigraph a bit. Let me first say that I am super impressed with the extent and the professionalism of this project and this repository. All the more since so far this has essentially been a one-person project. It's a great idea to provide the various components (like the RDF parser or the SPARQL parser) as independent modules/crates. Compilation from source was unproblematic and the command-line interface is easy to use and self-explanatory. In particular, it is very easy to load a dataset and start a server. Everything just works. In the world of academe, this is the absolute exception.

I compared loading time, index size, and query time of Oxigraph vs. QLever for a moderately sized RDF dataset, namely https://dblp.org/rdf/dblp.ttl.gz (1.7 GB compressed, 390 M triples), and a variety of queries (see below). Everything was run on an AMD Ryzen 9 7950X 16-Core machine with 128 GB and 7.1 TB of NVMe SSD (high-quality but affordable consumer hardware, total cost around 2500 €).

Loading time was 640s for Oxigraph (0.6 M triples/sec) vs. 231s for QLever (1.7 M triples/sec) on NVMe SSD. On HDD, it was 2537s for Oxigraph vs. 270s for QLever (apparently, Oxigraph makes heavy use of random access during loading). The total size of the index files was 66.5 GB for Oxigraph vs. 7.7 GB for QLever (apparently, Oxigraph doesn't compress much yet). I am curious whether the proportions of these stats carry over to a larger dataset like Wikidata (19 B triples). For QLever, load time and index size are essentially proportional to the size of the input dataset.

Here are the results for six queries from https://qlever.cs.uni-freiburg.de/dblp ("Examples"), selected for their variety. For QLever, the cache was cleared before each query. For Oxigraph, no special precautions were taken, except that the server was started from scratch once at the beginning. Both servers were run on SSD. For Oxigraph, it can make a huge difference when the disk cache is empty (sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"). For the queries strongly affected by this, I indicated this by writing X -> Y, where X is the query time with empty disk cache and Y is the query time when repeating the query.

Query	Oxigraph	QLever	Result shape	Comment / Question
All papers published in SIGIR	1.5s -> 0.3s	0.01s	6254 x 3	Two simple joins, nothing special
Number of papers by venue	2.6s -> 1.8s	0.02s	19954 x 2	Scan of a single predicate with GROUP BY and ORDER BY
Author names matching REGEX	5.5s -> 0.5s	0.03s	513 x 3	Joins, GROUP BY, ORDER BY, FILTER REGEX
All papers in DBLP until 1940	310s	0.10s	70 x 4	Three joins, a FILTER, and an ORDER BY, why does Oxigraph take so long?
All papers with their title	134s -> 83s	4.16s	716786 x 2	Single predicate, but must materialize large result (problematic for many SPARQL engines)
All predicates ordered by the number of subjects	102s	0.01s	68 x 2	Conceptually requires a scan over all triples, but huge optimization potential

hannahbast · 2024-04-03T11:20:36Z

hannahbast
Apr 3, 2024
Author

P.S. In the meantime, I have included other engines in the comparison (it's now: Oxigraph, Apache Jena, Stardog, Blazegraph, Virtuoso, QLever). Results are reported here: https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines . The first table gives a nice overview.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfomance comparison Oxigraph vs. QLever #841

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Perfomance comparison Oxigraph vs. QLever #841

hannahbast Mar 31, 2024

Replies: 1 comment

hannahbast Apr 3, 2024 Author

hannahbast
Mar 31, 2024

hannahbast
Apr 3, 2024
Author