Skip to content

haozhenWu/Calibrated-Boosting-Forest

Repository files navigation

Calibrated Boosting-Forest

Build Status

Calibrated Boosting-Forest (CBF) is an integrative technique that leverages both continuous and binary labels and output calibrated posterior probabilities. It is originally designed for ligand-based virtual screening and can be extended to other applications. Calibrated Boosting-Forest is a package created by Haozhen Wu from Small Molecule Screening Facility
at University of Wisconsin-Madison.

For more details, please see our paper:
Calibrated Boosting-Forest by Haozhen Wu

Key features:

  • Take both continuous and binary labels as input (multi-labels)
  • Superior ranking power over individual regression or classification model
  • Output well calibrated posterior probabilities
  • Streamlined hyper-parameter tuning stage
  • Support multiple evaluation and stopping metrics
  • Competitive benchmark results for well-known public datasets
  • XGBoost backend

Table of contents:

Dependencies:

Installation

We recommend you to use Anaconda for convenient installing packages. Right now, LightChem has been tested for Python 2.7 under OS X and linux Ubuntu Server 16.04.

  1. Download 64-bit Python 2.7 version of Anaconda for linux/OS X here and follow the instruction. After you installed Anaconda, you will have most of the dependencies ready.

  2. Install git if do not have:
    Linux Ubuntu:

    sudo yum install git-all
  3. Install scikit-learn:

    conda install scikit-learn=0.18
  4. Install conda distribution of xgboost

    conda install --yes -c conda-forge xgboost=0.6a2
  5. Install rdkit Note: rdkit is only used to transform SMILE string into fingerprint.

    conda install -c omnia rdkit
  6. Clone the Calibrated-Boosting-Forest github repository:

    git clone https://github.com/haozhenWu/Calibrated-Boosting-Forest.git

    cd into Calibrated-Boosting-Forest directory and execute

    pip install -e .
    

Testing

To test that the dependencies have been installed correctly, simply enter pytest in the lightchem directory. This requires the optional pytest Python package. The current tests 1.confirm that the required dependencies exist and can be imported, 2.confirm the model performance results of one target MUV-466 fall into expected ranges.

FAQ

  1. When I import lightchem, the following error shows up version GLIBCXX_3.4.20 not found:
    Try:
    conda install libgcc
    Source

Reference

  1. [DeepChem] (https://github.com/deepchem/deepchem): Deep-learning models for Drug Discovery and Quantum Chemistry