Skip to content

A literature review of biologically constrained machine learning models

License

Unknown, CC0-1.0 licenses found

Licenses found

Unknown
LICENSE.md
CC0-1.0
LICENSE-CC0.md
Notifications You must be signed in to change notification settings

greenelab/biopriors-review

Repository files navigation

BioPriors Review Paper

HTML Manuscript PDF Manuscript Build Status

The working title for this manuscript is Incorporating biological structure into machine learning models in biomedicine.

Abstract

In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is critical, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.

Manubot

Manubot is a system for writing scholarly manuscripts via GitHub. Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub. An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features. The rootstock repository is a general purpose template for creating new Manubot instances, as detailed in SETUP.md. See USAGE.md for documentation how to write a manuscript.

Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.

Repository directories & files

The directories are as follows:

  • content contains the manuscript source, which includes markdown files as well as inputs for citations and references. See USAGE.md for more information.
  • output contains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.
  • webpage is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
  • build contains commands and tools for building the manuscript.
  • ci contains files necessary for deployment via continuous integration. For the CI configuration, see .travis.yml.

Local execution

The easiest way to run Manubot is to use continuous integration to rebuild the manuscript when the content changes. If you want to build a Manubot manuscript locally, install the conda environment as described in build. Then, you can build the manuscript on POSIX systems by running the following commands from this root directory.

# Activate the manubot conda environment (assumes conda version >= 4.4)
conda activate manubot

# Build the manuscript, saving outputs to the output directory
bash build/build.sh

# At this point, the HTML & PDF outputs will have been created. The remaining
# commands are for serving the webpage to view the HTML manuscript locally.
# This is required to view local images in the HTML output.

# Configure the webpage directory
python build/webpage.py

# You can now open the manuscript webpage/index.html in a web browser.
# Alternatively, open a local webserver at http://localhost:8000/ with the
# following commands.
cd webpage
python -m http.server

Sometimes it's helpful to monitor the content directory and automatically rebuild the manuscript when a change is detected. The following command, while running, will trigger both the build.sh and webpage.py scripts upon content changes:

bash build/autobuild.sh

Continuous Integration

Build Status

Whenever a pull request is opened, Travis CI will test whether the changes break the build process to generate a formatted manuscript. The build process aims to detect common errors, such as invalid citations. If your pull request build fails, see the Travis CI logs for the cause of failure and revise your pull request accordingly.

When a commit to the master branch occurs (for example, when a pull request is merged), Travis CI builds the manuscript and writes the results to the gh-pages and output branches. The gh-pages branch uses GitHub Pages to host the following URLs:

For continuous integration configuration details, see .travis.yml.

License

License: CC BY 4.0 License: CC0 1.0

Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution. Please attribute by linking to https://github.com/greenelab/biopriors-review.

Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md). All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:

  • *.sh
  • *.py
  • *.yml / *.yaml
  • *.json
  • *.bib
  • *.tsv
  • .gitignore

All other files are only available under CC BY 4.0, including:

  • *.md
  • *.html
  • *.pdf
  • *.docx

Please open an issue for any question related to licensing.

About

A literature review of biologically constrained machine learning models

Topics

Resources

License

Unknown, CC0-1.0 licenses found

Licenses found

Unknown
LICENSE.md
CC0-1.0
LICENSE-CC0.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published