Skip to content

🧙 Sage: a SPARQL query engine for public Linked Data providers

License

Notifications You must be signed in to change notification settings

sage-org/sage-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sage: a SPARQL query engine for public Linked Data providers

Build Status PyPI version Docs

Read the online documentation

SaGe is a SPARQL query engine for public Linked Data providers that implements Web preemption. The SPARQL engine includes a smart Sage client and a Sage SPARQL query server hosting RDF datasets using HDT, postgres, sqlite, or hbase This repository contains the Python implementation of the SaGe SPARQL query server.

SPARQL queries are suspended by the web server after a fixed quantum of time and resumed upon client request. Using Web preemption, Sage ensures stable response times for query execution and completeness of results under high load.

The complete approach and experimental results are available in a Research paper accepted at The Web Conference 2019, available here. Thomas Minier, Hala Skaf-Molli and Pascal Molli. "SaGe: Web Preemption for Public SPARQL Query services" in Proceedings of the 2019 World Wide Web Conference (WWW'19), San Francisco, USA, May 13-17, 2019.

We appreciate your feedback/comments/questions to be sent to our mailing list or our issue tracker on github.

Table of contents

Installation

Installation in a virtualenv is strongly advised!

Requirements:

  • Python 3.7 (or higher)
  • pip
  • gcc/clang with c++11 support
  • Python Development headers

You should have the Python.h header available on your system.
For example, for Python 3.6, install the python3.6-dev package on Debian/Ubuntu systems.

Installation using pip

The core engine of the SaGe SPARQL query server with HDT as a backend can be installed as follows:

pip install sage-engine[hdt,postgres,hbase]

The SaGe query engine uses various backends to load RDF datasets. The various backends available are installed as extras dependencies. The above command install both the HDT, the PostgreSQL and the HBase backends.

Manual Installation using poetry

The SaGe SPARQL query server can also be manually installed using the poetry dependency manager.

git clone https://github.com/sage-org/sage-engine
cd sage-engine
poetry install --extras "hdt postgres hbase"

As with pip, the various SaGe backends are installed as extras dependencies, using the --extras flag.

Getting started

Server configuration

A SaGe server is configured using a configuration file in YAML syntax. You will find below a minimal working example of such a configuration file. Full examples are available in the config_examples/ directory

name: SaGe Test server
maintainer: Chuck Norris
quota: 75
max_results: 2000
graphs:
-
  name: dbpedia
  uri: http://example.org/dbpedia
  description: DBPedia
  backend: hdt-file
  file: datasets/dbpedia.2016.hdt

The quota and max_results fields are used to set the maximum time quantum and the maximum number of results allowed per request, respectively.

Each entry in the graphs field declare a RDF dataset with a name, description, backend and options specific to this backend. Different backends are available:

  • the hdt-file backend allows a SaGe server to load RDF datasets from HDT files. SaGe uses pyHDT to load and query HDT files.
  • the postgres backend allows a SaGe server to create, query and update RDF datasets stored in PostgreSQL. Each dataset is stored in a single table composed of 3 columns; S (subject), P (predicate) and O (object). Tables are created with B-Tree indexes on SPO, POS and OSP. SaGe uses psycopg2 to interact with PostgreSQL.
  • the postgres-catalog backend uses a different schema than postgres to store datasets. Triples terms are mapped to unique identifiers and a dictionary table that is common to all datasets is used to map RDF terms with their identifiers. This schema allows to reduce the space required to store datasets.
  • the sqlite backend allows a SaGe server to create, query and update RDF datasets stored in SQLite. Datasets are stored using the same schema as the postgres backend.
  • the sqlite-catalog is another backend for SQLite that uses a dictionary based schema as the postgres-catalog backend.
  • the hbase backend allows a SaGe server to create, query and update RDF datasets stored in HBase. To have a sorted access on dataset triples, triples are inserted three times in three different tables using SPO, POS and OSP as triples keys. SaGe uses happybase to interact with HBase.

PostgreSQL configuration

This section is optional and can be skipped if you don't use one of the PostgreSQL backends.

To ensure stable performance when using PostgreSQL with SaGe, PostgreSQL needs to be configured. Open the file postgresql.conf in the PostgreSQL main directory and apply the following changes in the Planner Method Configuration section:

  • Uncomment all enable_XYZ options
  • Set enable_indexscan, enable_indexonlyscan and enable_nestloop to on
  • Set all the other enable_XYZ options to off

These changes force the PostgreSQL query optimizer to generate the desired query plan for the SaGe resume queries.

Data ingestion

Different executables are available to load a RDF file depending on the backend you want to use.

To load a dataset from a HDT file, just declare a new dataset in your configuration file using the hdt-file backend.

To load a N-Triples file using one of the postgres, postgres-catalog, hbase, sqlite and sqlite-catalog backends, first declare a new dataset in your configuration file. For example, to load the file my_dataset.nt using the sqlite backend, we start by declaring a new dataset named my_dataset in our configuration file my_config.yaml.

quota: 75
max_results: 10000
graphs:
-
  name: my_dataset
  uri: http://example.org/my_dataset
  backend: sqlite
  database: sage-sqlite.db

For each backend, an example that illustrate how to declare a new dataset is available in the config_examples/ directory.

To load a file into a dataset declared using one of the SQLite backends, use the following commands:

# Create the required SQLite tables to store the dataset
sage-sqlite-init --no-index my_config.yaml my_dataset
# Insert the RDF triples in SQLite
sage-sqlite-put my_dataset.nt my_config.yaml my_dataset
# Create the SPO, OSP and POS indexes
sage-sqlite-index my_config.yaml my_dataset_name

To load a file into a dataset declared using one of the PostgreSQL backends, use the following commands:

# Create the required PostgreSQL tables to store the dataset
sage-postgres-init --no-index my_config.yaml my_dataset
# Insert the RDF triples in PostgreSQL
sage-postgres-put my_dataset.nt my_config.yaml my_dataset
# Create the SPO, OSP and POS indexes
sage-postgres-index my_config.yaml my_dataset_name

To load a file into a dataset declared using the hbase backend, use the following commands:

# Create the required HBase tables to store the dataset
sage-hbase-init my_config.yaml my_dataset
# Insert the RDF triples in HBase
sage-hbase-put my_dataset.nt my_config.yaml my_dataset

Starting the server

The sage executable, installed alongside the SaGe server, allows to easily start a SaGe server from a configuration file using Uvicorn, a Python ASGI HTTP Server.

# launch Sage server with 4 workers on port 8000
sage my_config.yaml -w 4 -p 8000

The full usage of the sage executable is detailed below:

Usage: sage [OPTIONS] CONFIG

  Launch the Sage server using the CONFIG configuration file

Options:
  -p, --port INTEGER              The port to bind  [default: 8000]
  -w, --workers INTEGER           The number of server workers  [default: 4]
  --log-level [debug|info|warning|error]
                                  The granularity of log outputs  [default:
                                  info]
  --help                          Show this message and exit.

Once started, you can interact with the SaGe server on http://localhost:8000/docs

SaGe Docker image

The Sage server is also available through a Docker image. In order to use it, do not forget to mount in the container the directory that contains you configuration file and your datasets.

docker pull callidon/sage
docker run -v path/to/config-file:/opt/data/ -p 8000:8000 callidon/sage sage /opt/data/config.yaml -w 4 -p 8000

Documentation

To generate the documentation, navigate in the docs directory and generate the documentation

cd docs/
make html
open build/html/index.html

Copyright 2017-2019 - GDD Team, LS2N, University of Nantes