Skip to content

Medically-informed data preprocessing for machine learning

License

Notifications You must be signed in to change notification settings

bcbi/PreprocessMD.jl

Repository files navigation

PreprocessMD.jl

Medically-informed data preprocessing for machine learning

Documentation Build Status
Build Status Coverage

Summary

The purpose of PreprocessMD.jl is to provide a suite of functions for preprocesing biomedical data. The scope of this package is medical data preprocessing, so we develop functions that are specific to biomedical research but general enough for widespread use. These tools are developed for the OMOP Common Data Model1, especially the MIMIC-IV demo set2.

Following the definitions of Hu et al.3, we consider data preprocessing to include project-level data manipulations, as opposed to the upstream data cleaning (e.g., error-corrections and standardizations) that is typically performed over an entire database, and the downstream data preparing (e.g., labelling and classification), which might vary across any number of analyses within a project.

Usage

An example pipeline is available in the documentation.

Features

Planned features for PreprocessMD.jl include:

  • Summaries and feasibility checks
  • Feature extraction
  • Variable derivation
  • Data imputation
  • Dimension reduction

Footnotes

  1. https://ohdsi.github.io/CommonDataModel/

  2. https://physionet.org/content/mimic-iv-demo-omop/0.9/

  3. Wu, Hulin, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy, eds. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2020.