Skip to content
Jon Ison edited this page Jul 10, 2020 · 13 revisions

Use this page as a notepad for ideas that are not yet actionable.

Snakemake

https://github.com/edamontology/edamontology/issues/523

Snakemake is a simple workflow management system for specifying and evoking workflows in a python environment: https://bitbucket.org/snakemake/snakemake/wiki/Home. It uses a rules-based system for combining tools, based on named tools and defined inputs and outputs. But these details are specific to a local snakemake installation.

Could a combination of bio.tools IDs plus EDAM Format annotations, help to standardise the description of tools/rules in snakemake, and make snakemake workflows more portable / shareable.

Understand that @jvanheld has libraries of rules for many NGS tools + examples of workflows for ChIP-seq and RNA-seq analysis: https://github.com/rioualen/gene-regulation

The tools used for the snakemake workflows are widely used tools, e.g. bowtie, bedtools, samtools, MACS2, …

Two threads for possible actions:

  1. test how sample workflows can be entered as workflow entries in bio.tools
  2. scope how bio.tools IDs and EDAM Format annotations could be applied to snakemake

SourceData

https://github.com/edamontology/edamontology/issues/521

Is there scope for EDAM integration with SourceData (platform for researchers and publishers to make their papers discoverable based on their data content - it is supported by SIB and various journals) http://sourcedata.embo.org/

Webulous and Rightfield

https://github.com/edamontology/edamontology/issues/519

Webulous could be used to create EDAM extensions (or modifications?) from spreadsheets, and RightField to generate Excel spreadsheet templates embedded with EDAM, for curation purposes.

Using inferred sentences to validate conceptual integrity

https://github.com/edamontology/edamontology/issues/366

Dan says (ages ago :) ) "Sentences that can be formally 'reasoned' from the ontology should make semantic sense. i.e. the right type of 'types' should be used to link concepts, such as "'Gibbs sampling' /is a/ 'Statistical method'". Sounds dumb, but I bet we can get some weird / unexpected constructions."

Jon says ... "A tool could be developed to pull up a random sample of such sentences, Sort of a "unit test" for EDAM concepts, where a unit test is a human readable sentence"

Use of EDAM wrt GO, SO etc.

https://github.com/edamontology/edamontology/issues/369

Matus says (from RostLab meeting 2016):

" For a subset X of sequence features and properties from SO, there are concepts among operations in EDAM in form similar to “x(from X) (classification|prediction)”.

  • For another subset Y of sequence features and properties from SO, there are concepts among data in EDAM in form similar to “y(from Y) (classification|prediction) (report|record)”.
  • Examples missing in either X or Y or both (that are relevant for PredictProtein) are e.g. some specific kinds of protein structure: disorder, coiled-coils, and various 2ndary structure elements (see Protein Secondary Structure on Wikipedia)
  • It would be more maintainable (both wrt EDAM and the Tools Registry annotations and any other annotations) to use EDAM concepts like “Biopolymer feature (inference|record)” plus a corresponding SO concept.
  • Another somewhat related issue is for example biological function{versus}GO concepts (report|record|inference|classification|prediction){,1}
    • Which of the related concepts should be EDAM?
    • Would using generic EDAM concepts plus a corresponding GO concept work here?
    • What about other aspects and ontologies/taxonomies: environment, taxon, phenotype, … ?

EOSC / EDMI metadata guidelines

https://github.com/edamontology/edamontology/issues/379

See https://eosc-edmi.github.io/properties

EDAM is directly relevant, e.g. to format and scientificType

Background info follows ....

https://eoscpilot.eu/edmi-metadata-guidelines https://eosc-edmi.github.io/properties EOSCpilot has produced a first draft of the strategy and recommendations to help users and services to find and access datasets across several scientific disciplines. This strategy relies on three main ideas:

  • Agreeing on a common and minimum dataset metadata properties to be exposed by data resources (following evaluation of existing metadata models and APIs).
  • Supporting a coordinated ecosystem of dataset metadata catalogues working together to efficiently manage and exchange their metadata.
  • Demonstrate the applicability of these recommendations by implementing them in real use cases to allow users services to find and access data.

The study has explored research data resources which organize their data records in datasets. The description of records tends to be specific for each discipline, but the description of the dataset itself is quite similar, therefore the report proposes a simple metadata guideline to allow EOSC users and services to find and access data, based on a number of common properties, found across diverse scientific disciplines.

Tooling for ontology building from csv

https://github.com/edamontology/edamontology/issues/395

From speaking with @simonjupp, there's some nice tooling for ontology building from csv, which def. should consider to use by thematic editors / community revision of EDAM in thematic areas. See https://github.com/echinoderm-ontology/ecao_ontology/tree/master/src/templates.