Skip to content

A collection of open source tools and resources related to Wikibase knowledge graphs

License

Notifications You must be signed in to change notification settings

shigapov/wikibase-knowledge-graphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 

Repository files navigation

Wikibase knowledge graphs

DOI

A collection of open source tools and resources related to Wikibase knowledge graphs.

Motivation

  1. Given multiple unlinked datasets describing the same things (entities).
  2. Need for agile collaborative data integration (Wikidata).
  3. Need for a semantic layer in data/information/knowledge management in an organization.

Awesome knowledge graphs

  1. Knowledge graphs by Aidan Hogan et al. [preprint], [HTML book]
  2. The Knowledge Graph Cookbook. Recipes that work by Andreas Blumauer & Helmut Nagy [book]
  3. KIT Knowledge Graphs course by Harald Sack & Mehwish Alam [description]
  4. Stanford Knowledge Graphs course CS 520 [2020], [2021]
  5. Knowledge Graphs - Foundations and Applications by Harald Sack [description]
  6. Knowledge Graphs: Methodology, Tools and Selected Use Cases Dieter Fensel et al [book]

Awesome Wikibase tutorials

  1. Programmer's guide to Wikibase [guide]
  2. Wikibase: configure, customize, and collaborate by Dan Scott [tutorial]
  3. posts about Wikibase & Wikidata and Tech Lead Digests by Adam 'addshore' Shorland
  4. Wikibase Install Basic Tutorial by Matt Miller [tutorial]
  5. Wikibase for Research Infrastructure by Matt Miller [post]
  6. Vanderbilt Heard Library digital scholarship resources on Wikidata and Wikibase [resources]
  7. Putting Data into Wikidata using Software by Steve Baskauf [post]
  8. Learning Wikibase
  9. Get your own copy of WikiData by Wolfgang Fahl [post]
  10. Transferring Wikibase data between wikis by Jeroen De Dauw [post]
  11. Wikibase resources by Olaf Janssen & KB national library of the Netherlands GitHub repo

Wikibase Architecture

Installing Wikibase

  1. Manual installation of the Wikibase Suite
  2. If you already have Mediawiki, install manually the Wikibase extension
  3. docker-compose up -d of the Wikibase Docker Image
  4. Obsolete: WbStack as a part of the "Wikibase as a service". Ask an invitation from Adam Shorland
  5. Wikibase Cloud is "Wikibase as a service".
  6. Ansible playbook for Wikibase [docs]

Data model

Data import

Before starting with data import please read the following resources:

The Wikibase frontend

  • Creating new items: Click Special Pages on the left-hand menu and then Create a new item
  • Creating new properties: Click Special Pages on the left-hand menu and then Create a new property

The Wikibase API and wrappers

A recommended way to import data into a Wikibase instance is via the Wikibase API. Many wrappers of the Wikibase API exist:

  1. With Graphical User Interface (GUI)
  2. With command Line Interface (CLI)
  3. Libraries

The Mediawiki MySQL database

An unrecommended way to import data into a Wikibase intance is via direct inserts into the MySQL database (MariaDB). Then, Wikibase Query Updater sends data from MariaDB to the graph database Blazegraph. It is faster but more risky, because undesired inserts might happen by accident.

  1. wikibase-insert is a Java tool described in the [FactGrid's post]
  2. RaiseWikibase is a Python tool described in [preprint], [docs], [poster]

Federated Properties

Federated properties in Wikibase are still under development:

Current workaround is getting basic info about the properties from the Wikidata SPARQL endpoint and creating those properties locally:

Every property is associated with a certain datatype in the Wikibase Data Model. Some of the datatypes are not native and require extensions. See:

Data Wrangling

Data reconciliation

Named Entity Linking

Named entity linking is widely used for creating and extending knowledge graphs.

Texts

SOTA algorithms can be found at paperswithcode. 16 benchmarks are available.

Tables

SOTA algorithms at Wikidata were tested at SemTab 2020: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching and SemTab 2021:

See also:

Data validation

The first mechanism is using constraints:

The second mechanism is using Entity Schemas and Shape Expressions (ShEx).

Tools for entity schemas:

  • WikiShape is a playground (vizualization, querying, validation & extraction) customized for Wikibase instances [code]
  • Wikidata Shape Expressions Inference is a tool for automatic inference of ShEx schemas from a set items [code]
  • sheXer is an automatic inference of ShEx schemas from a set of items [code]
  • YASHE is a ShEx editor [code]
  • ShExStatements is a tool for simplified writing the shape expressions in Wikidata [paper], [code]
  • ShEx2 (aka shex.js) is a simple online validator [code], [zenodo]
  • RDFShape is a general RDF playground for data validation and conversion between semantic formats [paper]
  • PyShExy is an API to validate RDF entities against ShEx schemas using PyShEx

Relevant papers:

Wikibase Ecosystem

Wikibase Community

Wikibase Summaries

Conferences and workshops

Awesome Master and PhD theses

  • Schema Inference on Wikidata by Lucas Werkmeister [thesis]
  • Modelling and Importing Dynamic Data into Wikibase: A Case Study of the Swiss Transportation System by Samuel Meuli [thesis]

Wikibase-Wikidata papers

  1. Fudie Zhao, A systematic review of Wikidata in Digital Humanities projects, Digital Scholarship in the Humanities, Volume 38, Issue 2, June 2023, Pages 852–874, https://doi.org/10.1093/llc/fqac083
  2. Tharani, Karim. "Much more than a mere technology: A systematic review of Wikidata in libraries." The Journal of Academic Librarianship 47.2 (2021): 102326. https://doi.org/10.1016/j.acalib.2021.102326
  3. Waagmeester, A., Stupp, G., Burgstaller-Muehlbacher, S., Good, B. M., Griffith, M., Griffith, O. L., ... & Su, A. I. (2020). Wikidata as a knowledge graph for the life sciences. Elife, 9, e52614. https://doi.org/10.7554/eLife.52614
  4. Nielsen, F.Å., Mietchen, D., Willighagen, E. (2017). Scholia, Scientometrics and Wikidata. In: Blomqvist, E., Hose, K., Paulheim, H., Ławrynowicz, A., Ciravegna, F., Hartig, O. (eds) The Semantic Web: ESWC 2017 Satellite Events. ESWC 2017. Lecture Notes in Computer Science(), vol 10577. Springer, Cham. https://doi.org/10.1007/978-3-319-70407-4_36
  5. Turki, H., Shafee, T., Taieb, M.A.H., Aouicha, M.B., Vrandečić, D., Das, D. and Hamdi, H., 2019. Wikidata: A large-scale collaborative ontological medical database. Journal of Biomedical Informatics, 99, p.103292. https://doi.org/10.1016/j.jbi.2019.103292

Awesome Wikibase instances

  1. Wikidata is a general-purpose Wikibase knowledge graph [SPARQL]
  2. Wikibase Registry is a Wikibase knowledge graph of Wikibase knowledge graphs [SPARQL], [timeline of Wikibase instances]
  3. Rhizome Artbase is a Wikibase knowledge graph of born-digital artworks from 1999 to the present day [SPARQL]
  4. FactGrid is a Wikibase knowledge graph for historical research [SPARQL], [Viewer], [fast search via ringgaard.com]
  5. Lingua Libre is a Wikibase knowledge graph of audiovisual data [SPARQL]
  6. OpenStreetMap Metadata is a Wikibase knowledge graph of metadata in OpenStreetMap [SPARQL]
  7. PersonalData.io is a Wikibase knowledge graph about personal data ecosystem [SPARQL]
  8. EU knowledge graph is a Wikibase knowledge graph about European Union [SPARQL], [Question-Answering over KG], [paper at ISWC2021 "Wikibase as an Infrastructure for Knowledge Graphs: the EU Knowledge Graph"]
  9. enslaved.org is a Wikibase knowledge graph about people of the historical slave trade [frontend]
  10. Semlab Wikibase is a Wikibase knowledge graph of Semantic Lab at Pratt Institute with data about their research projects [SPARQL]
  11. Virus-Taxonomy is a Wikibase knowledge graph of virus taxonomy [SPARQL]
  12. DataTrek is a Wikibase knowledge graph of open data for Star Trek
  13. Nonbinary is a Wikibase knowledge graph of concepts relevant to nonbinary identities
  14. The De Jonge Wiki is a Wikibase knowledge graph of research that has been carried out on the Arenberg Castle
  15. Biblissima is a Wikibase knowledge graph of the Biblissima authority repositories
  16. Standartopedia is a Wikibase knowledge graph of Russian legal norms and requirements of standards
  17. DataCegeSoma is a Wikibase knowledge graph of authority data for CegeSoma / State Archives in Belgium created by Anne Chardonnens as a part of her PhD thesis [SPARQL]
  18. MaRDi portal is a Wikibase knowledge graph of mathematical research data [SPARQL]
  19. MiMoTextBase is a Wikibase knowledge graph of the French Enlightenment novel [SPARQL] [MiMoText Project] [Tutorial]
  20. EURHISFIRM is a sandbox Wikibase knowledge graph of historical high-quality firm level data for Europe [SPARQL], [GitLab]
  21. Aktienführer is a Wikibase knowledge graph of the German listed stock companies from the Hoppenstedt-Aktienführer from 1956 to 2018 [SPARQL]

More Wikibase instances can be found at Wikibase Registry and WikiAPIary.

Notes

The initial version of this repo is based on the slides Wikibase knowledge graphs for data management & data science presented at Data Literacy Snacks 2021.