Skip to content

A python package for Reasonable Ontology Templates (OTTR). Tooling for converting your tabular data into RDF structures by applying stOTTR templates.

License

Notifications You must be signed in to change notification settings

michalporeba/pyottr

Repository files navigation

pyOTTR - Reasonable Ontology Templates in Python

A Python module to help you convert your tabular data, like CSV, or Excel files, into Resource Description Framework (RDF) triples.

You can skip ahead to an example or usage instructions if you already know what RDF and OTTR are. But if you are unsure, let me give you some context before I tell you more about the project itself.

 

Introduction

The Resource Description Framework (RDF) is one of the W3C Open Web Standards. It is a flexible data model underpinning the Semantic Web. From w3.org/RDF:

RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.

RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a "triple"). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.

Linke Data is information on the Web expressed as a graph, in a common standard, like the RDF. It promotes data interoperability and integration since it uses globally unique identifiers (URIs) and shared vocabularies to denote concepts, enabling different datasets to "speak the same language". It makes data more meaningful, opening up new possibilities for interoperability, discovery, reusability, and collaboration without coordination.

RDF offers several compelling advantages compared to data locked in tabular formats like CSV, spreadsheets or tables in relational database systems. I am convinced, but why would you trust me? What if you wanted to try it yourself on data that makes sense to you?

Here is the problem. Most of our information is locked in simple tabular forms, and converting it to RDF statements is cumbersome and error-prone at best.

Reasonable Ontology Templates (OTTR) offers a solution. If writing RDF by hand is like assemby programming, OTTR is like switching to Python, but for ontology modelling. (At least according to OTTR creators).

Terse Syntax for Reasonable Ontology Templates (stOTTR)](https://dev.spec.ottr.xyz/stOTTR/) allows us to define templates which can then be used to reliably translate it to RDF. If you want to know about the motivation behind it, to really understand why, read the presentation by OTTR creators from the University of Oslo, or at watch the Motivation and Overview presentation.

 

Example

Probably the most common example in the world of ontologies is that of a named pizza!. I will copy here the example from the OTTR page. We can represent named pizzas in a CSV file named pizzas.csv like so:

name
Margherita
Hawaii
Grandiosa

We know that Margherita, Hawaii and Grandiosa are pizza names, because the first piece of information is name and the file is called pizzas. We can econde the same information in an RDF file my.rdf like so:

p:Margherita rdf:type owl:Class .
p:Margherita rdfs:subClassOf p:Pizza .
p:Margherita rdfs:label "Margherita" .

p:Hawaii rdf:type owl:Class .
p:Hawaii rdfs:subClassOf p:Pizza .
p:Hawaii rdfs:label "Hawaii" .

p:Grandiosa rdf:type owl:Class .
p:Grandiosa rdfs:subClassOf p:Pizza .
p:Grandiosa rdfs:label "Grandiosa" .

But with just a few instances it is a lot of typing, many opportunities for errors. So instead, we can write two stOTTR templates in a pizzas.stottr file.

ax:SubClassOf [ ?sub, ?super ] :: {
    ottr:Triple(?sub, rdfs:subClassOf, ?super)
} .

pz:Pizza [ ?identifier, ?label ] :: {
    ottr:Triple(?identifier, rdf:type, owl:Class),
    ax:SubClassOf(?identifier, p:Pizza),
    ottr:Triple(?identifier, rdfs:label, ?label)
} .

With this definition we can invoke instances of the pz:Pizza templates, perhaps further down in the file.

pz:Pizza(p:Margherita, "Margherita") .
pz:Pizza(p:Hawaii, "Hawaii") .
pz:Pizza(p:Grandiosa, "Grandiosa") .

The above will produce the exact RDF example as above with less chance of a typo or other mistake.

But we want to take it a step further. We already have the information in pizzas.csv and the stottr template in pizzeria.stottr. Now, let's use python to generate the RDF.

ottr = Ottr("pizzeria.stottr")

with open("pizzas.csv", "r") as data:
    for pizza in csv.DictReader(data):
        name = pizza["name"].strip()
        print(ottr.apply("pz:Pizza").to(Iri(f"p:{name}"), name))

Or even simpler, if the column names and variable names are aligned:

# this is the ambition, but the code will not work just yet
Ottr("pizzeria.stottr").make("pizzas.csv").into("pz:Pizza")

 

Try it yourself

If you try it yourself by starting with this code:

from ottrlib.model import Iri
from ottrlib.Ottr import Ottr

stottr_input = """
ax:SubClassOf [ ?sub, ?super ] :: {
    ottr:Triple(?sub, rdfs:subClassOf, ?super)
} .

pz:Pizza [ ?identifier, ?label ] :: {
    ottr:Triple(?identifier, rdf:type, owl:Class),
    ax:SubClassOf(?identifier, p:Pizza),
    ottr:Triple(?identifier, rdfs:label, ?label)
} .

pz:Pizza(p:Margherita, "Margherita") .
pz:Pizza(p:Hawaii, "Hawaii") .
pz:Pizza(p:Grandiosa, "Grandiosa") .
"""

ottr = Ottr()
for triple in ottr.expand(stottr_input):
    print(triple)

print()
for triple in ottr.expand('pz:Pizza(p:Pepperoni, "Pepperoni") .'):
    print(triple)

print()
data = [
    (Iri("p:Capricciosa"), "Capricciosa"),
    (Iri("p:Marinara"), "Marinara"),
    (Iri("p:Crudo"), "Crudo"),
]
for triple in ottr.apply("pz:Pizza").to(data):
    print(triple)

It will output:

p:Margherita rdf:type owl:Class
p:Margherita rdfs:subClassOf p:Pizza
p:Margherita rdfs:label "Margherita"

p:Hawaii rdf:type owl:Class
p:Hawaii rdfs:subClassOf p:Pizza
p:Hawaii rdfs:label "Hawaii"

p:Grandiosa rdf:type owl:Class
p:Grandiosa rdfs:subClassOf p:Pizza
p:Grandiosa rdfs:label "Grandiosa"

p:Pepperoni rdf:type owl:Class
p:Pepperoni rdfs:subClassOf p:Pizza
p:Pepperoni rdfs:label "Pepperoni"

p:Capricciosa rdf:type owl:Class
p:Capricciosa rdfs:subClassOf p:Pizza
p:Capricciosa rdfs:label "Capricciosa"

p:Marinara rdf:type owl:Class
p:Marinara rdfs:subClassOf p:Pizza
p:Marinara rdfs:label "Marinara"

p:Crudo rdf:type owl:Class
p:Crudo rdfs:subClassOf p:Pizza
p:Crudo rdfs:label "Crudo"

 

Usage

At the moment the project is in early stages of development. You can clone the repository and play with PyOTTR as it is. As shown above it can already process simple templates and currently new features are added daily. Very soon it will be available on PyPi (although the names probably will change by then).

If you cannot wait, you can always help to get it done. It's an open-source project after all!

 

Contributing

Starts and pull requests are welcomed as are any ideas and comments. To test the project after cloning do pytest. After making any changes do make style to run isort, flake8 and black on code which is not autogenerated.

If the ANTLR grammar has to be updated (due to changes to the specification) you will have to

  • Download a new grammar file and save it as antlr/stOTTR.g4.
  • Enusre there is turtleDoc : directive*; line as a first rule in Turtle.g4.
  • To build the grammar lexers and parsers in pyottr/grammar by invoking make grammar.

Resources

About

A python package for Reasonable Ontology Templates (OTTR). Tooling for converting your tabular data into RDF structures by applying stOTTR templates.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published