Spatial Index Evaluations for Apache Jena

This repository contains an evaluation setup that compares:

Jena's default spatial index implementation, referred to as vanilla
Our improved implementation, referred to as geoplus

The evaluation is uses our simple GridBench benchmark.

Result Datasets

We are in the process of finalizing the RDF evaluation dataset generation and deployment pipeline. The links will be updated to final versions in the coming days. ~ 2024-03-28

Alpha deployments of the datasets are deployed under:

http://maven.aksw.org/repository/snapshots/org/aksw/eval/gridbench/jena/

Query Types

In our evaluation we use three sets of queries which target the same spatial regions but differ in the sets of graphs they affect.

ng-one: Benchmark queries target a single named graph in the dataset. Uses GRAPH <CONST>.

Click here to show the query

PREFIX  geo:  <http://www.opengis.net/ont/geosparql#>
PREFIX  spatial: <http://jena.apache.org/spatial#>
PREFIX  geof: <http://www.opengis.net/def/function/geosparql/>

SELECT  (count(*) AS ?c)
WHERE
  { GRAPH <http://www.example.org/graph/0>
      { BIND("POLYGON((-90 -90, -90 -78.75, -78.75 -78.75, -78.75 -90, -90 -90))"^^geo:wktLiteral AS ?queryGeom)
        ?feature  spatial:intersectBoxGeom  ( ?queryGeom ) ;
                  geo:hasGeometry       ?featureGeom .
        ?featureGeom  geo:asWKT         ?featureGeomWkt
        FILTER geof:sfIntersects(?featureGeomWkt, ?queryGeom)
      }
  }

ng-all: Benchmark queries target all named graphs in the dataset. Uses GRAPH ?g.

Click here to show the query

PREFIX  geo:  <http://www.opengis.net/ont/geosparql#>
PREFIX  spatial: <http://jena.apache.org/spatial#>
PREFIX  geof: <http://www.opengis.net/def/function/geosparql/>

SELECT  (count(*) AS ?c)
WHERE
  { GRAPH ?g
      { BIND("POLYGON((-90 -90, -90 -78.75, -78.75 -78.75, -78.75 -90, -90 -90))"^^geo:wktLiteral AS ?queryGeom)
        ?feature  spatial:intersectBoxGeom  ( ?queryGeom ) ;
                  geo:hasGeometry       ?featureGeom .
        ?featureGeom  geo:asWKT         ?featureGeomWkt
        FILTER geof:sfIntersects(?featureGeomWkt, ?queryGeom)
      }
  }

ug: Benchmark queries target the union default graph, i.e. a view over all named graphs. Does not use GRAPH.

Click here to show the query

PREFIX  geo:  <http://www.opengis.net/ont/geosparql#>
PREFIX  spatial: <http://jena.apache.org/spatial#>
PREFIX  geof: <http://www.opengis.net/def/function/geosparql/>

SELECT  (count(*) AS ?c)
WHERE
  { { BIND("POLYGON((-90 -90, -90 -78.75, -78.75 -78.75, -78.75 -90, -90 -90))"^^geo:wktLiteral AS ?queryGeom)
      ?feature  spatial:intersectBoxGeom  ( ?queryGeom ) ;
                geo:hasGeometry       ?featureGeom .
      ?featureGeom  geo:asWKT         ?featureGeomWkt
    }
    FILTER geof:sfIntersects(?featureGeomWkt, ?queryGeom)
  }

Download Links

Visualization

Coming soon. ~ 2024-03-28

Show Query

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX lsq: <http://lsq.aksw.org/vocab#>
PREFIX agg: <http://jena.apache.org/ARQ/function/aggregate#>

SELECT ?time ?value WHERE {
  { SELECT ?benchmarkRun (AVG(?duration) AS ?durationAvg) (agg:stdev(?duration) AS ?durationStdev) {
      GRAPH ?query {
        ?query lsq:hasLocalExec ?localExec .
        ?localExec
          lsq:benchmarkRun ?benchmarkRun ;
          lsq:hasQueryExec/lsq:evalDuration ?duration .
      }
  } GROUP BY ?benchmarkRun }
  GRAPH ?query {
    ?query
      lsq:hasLocalExec ?localExec ;
      geo:hasGeometry/geo:asWKT ?wkt .

    ?localExec
      lsq:benchmarkRun ?benchmarkRun ;
      lsq:hasQueryExec ?queryExec .

    ?queryExec
      prov:atTime ?time ;
      lsq:evalDuration ?duration .    
  }
  GRAPH ?runGraph { ?benchmarkRun lsq:runId ?runId }
  FILTER(?runId = 0)
  # BIND((?duration - ?durationAvg) / ?durationStdev AS ?value) # How many sigmas a value differs from the average
  # BIND(?duration - ?durationAvg AS ?value)
  BIND(?duration AS ?value)
}
ORDER BY ASC(?time)

Reproducing Results

In an attempt to make the benchmark as reproducible as possible, we packaged it up as an Apache Maven build.

Prerequisites

Apache Maven must be installed
A running Docker deamon

Benchmark Execution

The default configuration requires ~48GB of free RAM.

Install the eval-template

cd eval-template
mvn install

Run the actual evaluation

cd eval-parent
mvn package

The query runtimes are collected in a trig dataset with one named graph per query under:

./eval-parent/eval-EXPERIMENT/target/bench.trig

Adjusting the benchmark

The amount of grid cells and the number of graphs can be configured in the eval-template/pom.yml.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.mvn		.mvn
eval-parent		eval-parent
eval-template		eval-template
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.mvn

.mvn

eval-parent

eval-parent

eval-template

eval-template

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Spatial Index Evaluations for Apache Jena

Result Datasets

Query Types

Download Links

Visualization

Reproducing Results

Prerequisites

Benchmark Execution

Adjusting the benchmark

About

Releases

Packages

AKSW/gridbench-results

Folders and files

Latest commit

History

Repository files navigation

Spatial Index Evaluations for Apache Jena

Result Datasets

Query Types

Download Links

Visualization

Reproducing Results

Prerequisites

Benchmark Execution

Adjusting the benchmark

About

Resources

Stars

Watchers

Forks