Skip to content

Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.

Notifications You must be signed in to change notification settings

cadmiumkitty/dcaf-2020-provo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Provenance and PROV-O Ontology talk at DCAF 2020

Introduction

The purpose of this demo is to show capturing of the provenance information using common vocabulary of PROV in a repo trading and risk reporting scenario. It is built using Event Sourcing and CQRS patterns on top of Kafka.

Set up

  1. Single Kafka node with single Zookeeper node
  2. Repo producer that creates and amends repo trades based on trade events
  3. Counterparty producer that creates and amends counterparty records
  4. Risk calculator that calculates risk figures based on repo events
  5. Provenance aggregator as a Kafka Connect node
  6. Simple Jena Fuseki triplestore to aggregate PROV data
  7. Prov-O-Viz set up for simple visualization

Running the demo

Build individual projects under repo (trade and counterparty events, risk calculator and event processor) and connect (Kafka Connect SPARQL sink for PROV) with mvn clean package.

Build and start containers with:

docker-compose up -d --build

Once containers are up and running, you can check that PROV triples are being created in Jena by going to http://localhost:3030/dataset.html?tab=query&ds=/dcaf and issuing simple SPARQL query such as:

SELECT *
WHERE {
  ?s ?p ?o
}

To view visualization go to http://localhost:5000/ and select endpoint http://fuseki:3030/dcaf/query telling PROV-O-Viz to Ignore Named Graphs.

About

Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published