Skip to content

A curated list of awesome stuff around the FAIR principles for (scientific) data, i.e that data is findable, accessable, interoperable and re-usable.

License

Notifications You must be signed in to change notification settings

Materials-Data-Science-and-Informatics/awesome-fair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyds


Awesome FAIR


A by the Helmholtz metadata collaboration (HMC) curated list of awesome stuff around the FAIR principles for (scientific) data, i.e that data is findable, accessable, interoperable and re-usable. The list is organized in use cases of data producers, data users, data curators and data provides. 'FAIR' is not the same as 'open', but there is overlap.

Contents

Resources about the FAIR principles

FAIR Digital Object and related projects

FAIR assessment

  • FAIR Evaluation Services - A FAIR assessment tool from FAIRsharing, code.

  • F-uji - An (online) tool which can provide a FAIR score for a given PID based on a metric created by FAIRsFAIR, code.

Organizations and Communities

  • EuDat - Collaborative European data infrastructure.

  • FAIRsharing - A curated resource on data and metadata standards, inter-related to databases and data policies.

  • Research Data Alliance - International organization and communication platform for establishing standards and recommendations concerning research data publication.

  • The Turing Way - Handbook and community for reproducible, ethical and collaborative data science.

Metadata formats and standards

  • DataCite - Metadata schema developed by international community with increasing adoption by repositories

  • Data Catalog (DCAT) - RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

  • Dublin Core Metadata Initiative Terms - Dublin Core Metadata Element Set, is a set of fifteen "core" elements for describing resources.

  • JSON LD Playground - Convert JSON-LD data between various representations.

  • JSON Schema - Standard for the description of structural constraints in order to do validation of JSON objects.

  • Provenance Primer (PROV) - This primer document provides an accessible introduction to the PROV data model for provenance interchange on the Web.

  • Resource Description Framework (RDF) - RDF is a standard model for data interchange on the Web.

  • Schema.org - Well-established and industry-accepted vocabulary providing semantics for common entities like Person, Organization, Dataset, etc.

  • SKOS - The Simple Knowledge Organization System (SKOS) is a common data model for sharing and linking knowledge organization systems via the Semantic Web.

Ontology services

  • Ontobee - A linked ontology data server to support ontology term dereferencing, linkage, query and integration. See also this publication.

  • Ontology Lookup Service - OLS is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions.

Related semantics lists

Also see

Finding datasets and software

  • Datacite commons - Search through the metadata indexed by Datacite.

  • EuDat B2find - Search through metadata of datasets accumulated by EuDat.

  • Microsoft academy - Mircosoft academy search through a pid graph created by microsoft (shutdown end of 2021).

  • OpenAIRE explorer - Search through the metadata indexed by openaire.

  • Schole explorer - A data literature interlinking service (former scholix), indexes links between data and journal publications. It also provides interfaces and APIs to query the graph.

  • Research Software Repository - Aggregates research software from various sources with information about the problem it solves and its scientific domain.

Software and software publications

  • CITATION.CFF - Plain text files with human- and machine-readable citation information for software (and datasets). Supported by GitHub, Zenodo, Zotero.

  • Citable code with Zenodo & GitHub - Make GitHub repositories citable with Zenodo DOI.

  • CodeMeta - CodeMeta works on providing a minimal metadata schema for science software and code, in JSON and XML to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations.

  • fossology - FOSSology is an open source license compliance software system and toolkit. You can run license, copyright and export control scans from the command line.

  • HERMES - A CI based workflow to create and publish software publications to well known repositories.

  • SOMEF - Extract software publication metadata from README and other docs automatically using ML and other techniques to reduce the amount of boilerplate work for the developer.

Related research software lists

Provenance tracking

  • AiiDA - Automated Interactive Infrastructure and Database for Computational Science (AiiDA) to automatically track provenance of simulation workflows and all associated data, code.

  • DataLad - A free and open-source distributed data management system for everyone. It is based on git-annex with manual to automatic provenance tracking, code.

  • MLflow - Tool to track the provenance of machine learning applications, code.

  • CWL - Domain-agnostic and community-driven open standard for description and execution of research workflows that supports provenance tracking (CWLProv) in a standard-compliant way using the existing RO Crate, PROV and BagIt standards.

  • PROV-O Primer - An introduction to the data model of Provenance Ontlogy (PROV-O)

Related workflow tools lists

There is overlap with these more general lists of workflow tools, but not every pipeline or workflow manager includes good provenance tracking.

Metadata management

Your own repository setup

  • Dataverse - Open source research data repository software code.

  • EuDat B2share - A repository by EuDat, but the software is open sourc, bases in invenio and one can setup own instances of it, code.

  • Invenio - Open source customizable software to setup large scale digital repositories, library systems and data repositories, code.

  • InvenioRDM - The turn-key research data management repository based on Invenio framework and Zenodo.

Awesome meta data sources

  • Microsoft academy graph - All the data and links from Mircosoft academy (shutdown end of 2021).

  • Openaire graph - All metadata contained in the openaire graph.

  • Scholix - A schema for scholarly links. Implemented and deployed by several scholarly link providers.

  • CrossRef - Organization building connections between related entities, building a queryable graph.

Related lists

Awesome lists related to several points.

  • awesome-rse - An awesome list by HIFIS collecting information about research software engineering, touching FAIRness and sustainability

  • awesome-rse-policies - An awesome list by HIFIS collecting information about research software engineering policies, touching FAIRness and sustainability

  • Awesome-open-climate-science - An open science related list specific to the domain of Atmospheric, Ocean, and Climate science.

  • Awesome-open-science-software - A list of open science resources and software.

  • Awesome Curated Tools - A curated list of digital tools we use, ranging from accounting and data science to scientific research and liquid democracy.

Contributing

Contributions are welcome! 😎
If you want to contribute please read the contribution guideline.