Skip to content

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Notifications You must be signed in to change notification settings

steady137/rdf2rml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Table of Contents

Introduction

See this presentation:

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video

RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.

We describe a tool rdfpuml that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation.

If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool rdf2rdb generates R2RML transformations from such examples, saving about 15x in complexity.

See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.

Documentation

Source: ./doc/rdfpuml.pod, ./doc/rdf2rml.pod, ./doc/rdf2tarql.pod

Installation

Checkout this repo and add rdf2rml/bin to your path. Install the following prerequisites:

  • both tools: Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
  • rdfpuml:
    • GraphViz
    • PlantUML. You need a recent version for new features like arrow length and color. I’m currently running 1.2018.10beta7. See in particular plantuml class diagrams.
    • Perl modules: use cpan or cpanm to install them: RDF::Trine RDF::Query Encode FindBin Carp::Always Slurp
    • RDF::Prefixes::Curie. This is my own module located in ./lib, and rdfpuml needs FindBin to locate it.
  • rdf2rml:
    • Apache Jena: riot, update. Tested with version 3.1.0 of 2016-05-10.
    • cat, grep, rm

Debian Repo

Jonas Smedegaard (@jonassmedegaard, dr at jones fullstop dk) has volunteered for some of the tasks below. His development is at https://salsa.debian.org/debian/rdf2rml/branches. To adopt changes, do something like this.

  • To merge all commits in the salsa/develop branch:
    cd rdf2rml    # i.e. your local clone of your Github project
    git remote add salsa https://salsa.debian.org/debian/rdf2rml.git
    git fetch salsa
    git merge salsa/develop
        
  • To adopt only single commits from the salsa/develop branch, issue remote and fetch as above, then issue:
    git cherry-pick $commit1 $commit2 $commit3
        

Change Log [7/8]

DONE 2018-11-14 Avoid puml:stereotype class node

I often define puml:stereotype for some classes in prefixes.ttl. If the class is not used in some particular turtle, it should avoid emitting a disconnected puml class.

  • `stereotypes`: Avoid emitting
  • `has_statements_different_from`: Check that a node has statements other than puml:stereotype

DONE 2018-06-29 Bug: class and puml:InlineProperty

When a type is also used with puml:InlineProperty, it caused this error:

Can't locate object method "uri_value" via package "RDF::Trine::Node::Literal" at rdfpuml.pl line 261.
   main::puml_qname(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 279
   main::puml_node2(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 128

An inline is converted to a literal, but rdf:type is always assumed to be a URL.

Test: ./test/regression/type-inlineProperty.ttl

DONE 2018-04-05 Arrow Attributes

Add arrow attributes (dotted, dashed, bold) and length

Test: ./test/regression/arrowLen.ttl

DONE 2020-05-30 rdf2rml: inverse edge

When an edge Y-P-X is recorded in the RDB table of X (as foreign key) or in an association table, it is awkward to specify that table in the node Y. So I added this SPARQL UPDATE clause:

  • If a node ?y has no SQL, is not Inlined, has a single outgoing edge, then add the SQL of its counterparty ?x as default

DONE 2020-06-01 rdf2tarql.pl

Add rdf2tarql.pl script to generate TARQL script (CSV-RDF conversion) from model.

DONE 2020-06-01 rdf2rml.sh, rdf2rml.ru

  • Improve script to abort if the first pipeline step (“update”) fails
  • Improve script to work on Cygwin (invokes the Jena tools as riot.bat and update.bat)
  • Filter out harmless warnings from Jena update’s error log, eg ~”(some_date)”^^xsd:date~
  • If a node has single outgoing link and no SQL query/table (puml:label), propagate that property backward across the link into the node (previously that was done only for incoming links)

DONE 2020-09-17 rdf2rml: logicalTable

Use URL for logicalTable instead of blank node, so that R2RML generated from different models for different tables can be merged more easily. Warning: this assumes that all instances of one subjectMap use the same query.

DONE 2021-09-02 Unicode Processing

Use Perl option -C when invoking for proper Unicode processing. See doc section rdfpuml.html#Unicode

TODO 2020-09-17 rdf2rml: disentangle inverse edge

In the case Y-P-X described above:

  • Also need to record ?y puml:property ?p so this prop name can be added to ?y’s subject map
  • When making ?map, take puml:property into account
  • But ?map is made many times, and copy-paste is no good…
  • Also, this should be done in some cases but not others…
  • So it’s better to record ?y puml:map ?map

To Do Tasks [2/21]

Help needed for the following tasks. Post bugs and enhancement requests to this repo!

Near-term [2/11]

INPROGRESS Modularize and Package Better

INPROGRESS Regression Tests

  • sort is added at various places to make the tool more deterministic, i.e. independent of order of RDF statements in the input file. However, this will interfere with the ability to control the layout, especially of disconnected components (see layout_new_line)
  • Some regression tests are added.

TODO Release on CPAN

DONE Easier installation

There’s a pull request VladimirAlexiev#7 that dockerizes the installation. As of 18-Sep-2019 it’s undergoing code review.

Unicode [0/2]

TODO Get rid of myprint()

This was made because of some Unicode troubles

INPROGRESS Add Unicode tests

Add ttl with non-ASCII chars: Accented, Cyrillic, French, etc.

Prefixes [0/3]

TODO Eliminate Curie.pm

./lib/RDF/Prefixes/Curie.pm remembers @base and uses that for URL shortening. Once perlrdf#131 is fixed, eliminate this dependency (local module)

TODO Remember prefixes from input file

rdfpuml shortens URLs using prefixes only from prefixes.ttl, but should also use prefixes defined in the individual input file.

DONE Allow specifying the prefixes file

See VladimirAlexiev#7

TODO Support more RDF Formats

Now it only supports Turtle, because it concatenates prefixes.ttl to the main file. If it can collect all prefixes from RDF files, such concatenation won’t be needed

TODO Batch Processing

#1: plantuml is slow to start up, so we’d like to process a bunch of puml files at once. The best way is to have a smarter script or Makefile that uses the following http://plantuml.com/command-line features:

  • Keep the intermediate puml files (the current Makefile doesn’t preserve them)
  • Run plantuml on a whole folder (with -r[ecurse] it can even recurse through subfolders)
  • Use -checkmetadata to skip png files that don’t need to be regenerated. (The whole puml text is stored in the png, so plantuml can quickly check that there are no changes)
  • The Makefile should start plantuml only once, if some of the puml files is newer than its respective png file

“Manual” Batching

Before I discovered the -checkmetadata option, I had the idea that rdfpuml could put several diagrams in one puml file:

@startuml file1.png
  # made from file1.ttl
@enduml
@startuml file2.png
  # made from file2.ttl
@enduml

However, this interferes with make processing that regenerates only png for changed ttl files, and makes things less modular overall.

Mid-Term [0/5]

TODO Upgrade to use Attean

Trine (Perl RDF) is end of life. Attean is the new generation

TODO Integrate in Emacs org-mode

Write Turtle, see diagram (easy to do)

TODO Node colors, icons, tooltips

See ./ideas

TODO More arrow types and styles

./ideas/arrows.png ./ideas/arrows-2.png

dotted|dashed|plain|bold|hidden|norank|single|thickness

TODO Extra Layout Options

Local layout options are described in Help on Layout:

  • “hidden” makes a constraint between two nodes, but does not draw the link (rdfpuml already implements this)
  • norank ignores a link for layout purposes (same as graphviz constraint=false)
  • “together” groups classes as if they were in the same package (i.e. puts them in a graphviz cluster)

Global options include (eg see this diagram):

And there are a lot more undocumented features: https://forum.plantuml.net/7095

TODO Custom Reification

Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)

TODO Use MindMap/WBS for Hierarchies

Plantuml now has MindMap and WBS (or OBS) diagrams that use a simple bulleted syntax to draw hierarchies.

It would be nice to use this to draw hierarchies of individuals, in particular taxonomies.

Here are examples of the two styles:

http://www.plantuml.com/plantuml/png/SoWkIImgoStCIybDBE3IKd1szUVIqbBmLGi6Ka0wiIWxjIGpBntC2qxCIIq6IJk7W5Mv-0Q0nTsB4WioN9p0x82Sn9Aq_A9SBeVKl1IekG00

http://www.plantuml.com/plantuml/png/SoWkIImgAKygvj9IS7RrvzBIKl1L2mPIG3gnA3kr93Cl7SmBJin9BGP9EuU0LRdu1e35tOiI2p9SdC3iW9p4ahJyebmkXzIy5A2P0000

Long-Term [0/3]

TODO rdf2soml to Generate Semantic Object Models

A new tool rdf2soml to generate Ontotext Platform SOML from RDF examples.

What’s missing? Most importantly: property cardinality and virtual inverses.

PlantUML can show arrow cardinalities, and this simple and natural PlantUML code:

X "0:1" -left-> "1:m" Y : prop/\ninvProp

Is depicted as follows:

./ideas/cardinality-and-inverse.png

We have two options how to express this in triples:

Cardinality With RDF*

##### model triples
:X :prop :Y. 
##### puml triples
<< :X :prop :Y >> 
  puml:arrow puml:left; # direction
  puml:min 1; puml:max puml:inf; # cardinality
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"]. # virtual inverse
  • Pros: very natural
  • Cons:
    • Perl RDF doesn’t support RDF*, and few editors support it either.
    • Annotating a triple does not assert it, so we need to assert it as well

Cardinality With Blank Node

##### model triples
:X :prop :Y. 
##### puml triples
:X puml:left :Y. # direction
:X :prop [ # a puml:Cardinality; # may need this marker class to skip the node from the diagram
  puml:min 1; puml:max puml:inf; # cardinality
  puml:object :Y; # only needed if X has several relations "prop" and they need different annotations
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"] # virtual inverse
].

TODO rdf2shape to Describe & Generate RDF Shapes

TODO Visualize RDF Shapes (SHACL and ShEx)

TODO Generate transformations for other than relational sources

R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate:

  • RML: extends R2RML to handle RDB, XML, JSON, CSV
  • XSPARQL: extends XQuery with SPARQL construct and JSON input
  • DONE tarql: handles TSV/CSV with SPARQL construct

Citations

If you use this software, please cite it

  • RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video.
@InProceedings{Alexiev-rdfpuml-rdf2rml,
  author       = {Vladimir Alexiev},
  title        = {{RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation}},
  booktitle    = {Semantic Web in Libraries 2016 (SWIB 16)},
  year         = 2016,
  month        = nov,
  address      = {Bonn, Germany},
  url_Slides   = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index.html},
  url_HTML     = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index-full.html},
  keywords     = {RDF, visualization, PlantUML, cultural heritage, NLP, NIF, EHRI, R2RML, generation, model-driven, RDF by Example, rdfpuml, rdf2rml},
  url_PDF      = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/RDF_by_Example.pdf}, 
  url_Video    = {https://youtu.be/4WoYlaGF6DE},
  type         = {presentation},
  abstract     = {RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required. We describe a tool *rdfpuml* that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation. If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool *rdf2rdb* generates R2RML transformations from such examples, saving about 15x in complexity.},
}

The following papers mention the software:

  • Zhuhadar, L., & Ciampa, M. (2017). Leveraging learning innovations in cognitive computing with massive data sets: Using the offshore Panama papers leak to discover patterns. Computers in Human Behavior. doi:10.1016/j.chb.2017.12.013
  • Debruyne, C., Lewis, D. and O’Sullivan, D., (October 2018). Generating Executable Mappings from RDF Data Cube Data Structure Definitions. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (pp. 333-350). Springer. doi:10.1007/978-3-030-02671-4_21

Related Work

About

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 75.5%
  • Ruby 15.8%
  • Makefile 4.3%
  • Shell 4.1%
  • Batchfile 0.3%