Skip to content

Tools and infrastructure for automated compound discovery using Folding@home

License

Notifications You must be signed in to change notification settings

choderalab/fah-xchem

Repository files navigation

fah-xchem

GitHub Actions Build Status codecov

Tools and infrastructure for automated compound discovery using Folding@home.

Installation

  1. Clone the repository and cd into repo root:

    git clone https://github.com/choderalab/fah-xchem.git
    cd fah-xchem
  2. Create a conda environment with the required dependencies:

    conda env create -f environment.yml

    If the above process is slow, we recommend using mamba to speed up installation:

    mamba env create -f environment.yml
  3. Install fah-xchem in the environment using pip:

    pip install .

Example usage

Download molecule and experimental data from CDD and generate an experimental data file for analysis use:

export CDD_VAULT_NUM=<vault-num>
export CDD_VAULT_TOKEN=<vault-token>

FLUORESCENCE_IC50_PROTOCOL_ID=49439

# will take some time; pulls full data export from CDD
fah-xchem -l INFO cdd --data-dir cdd-data/ retrieve-protocol-data --molecules -i $FLUORESCENCE_IC50_PROTOCOL_ID

# next step REQUIRES OpenEye license
export OE_LICENSE=/path/to/oe_license.txt

# merges and transforms data elements pulled from CDD into usable form for downstream analysis
fah-xchem -l INFO cdd --data-dir cdd-data/ generate-experimental-compound-data -i 49439 experimental_compound_data.json

Run transformation and compound free energy analysis, producing results/analysis.json:

fah-xchem --loglevel INFO \
        compound-series analyze \
        --experimental-data-file experimental_compound_data.json \
        --config-file config.json \
        --fah-projects-dir /path/to/projects/ \
        --fah-data-dir /path/to/data/SVR314342810/ \
        --loglevel INFO \
        --nprocs 8
        compound-series.json \
        /path/to/output-dir/analysis.json

Generate representative snapshots, plots, PDF report, and static site HTML in output directory:

fah-xchem --loglevel INFO \
        artifacts generate \
        --config-file config.json \
        --fragalysis-config fragalysis_config.json \
        --fah-projects-dir /path/to/projects/ \
        --fah-data-dir /path/to/data/SVR314342810/ \
        --website-base-url https://my-bucket.s3.amazonaws.com/site/prefix/ \
        --cache-dir results/cache/ \
        --nprocs 8 \
        /path/to/output-dir/analysis.json \
        /path/to/output-dir/

Unit conventions

Energies are represented in configuration and internally in units of k T, except when otherwise indicated. For energies in kilocalories per mole, the function or variable name should be suffixed with _kcal.

Configuration

Compound series

The compound series is specified as JSON with schema given by the CompoundSeriesAnalysis model (see fah_xchem.schema.

Analysis configuration

Some analysis options can be configured in a separate JSON file with schema given by the AnalysisConfig model. For example,

config.json

{
    "min_num_work_values": 10,
    "max_binding_free_energy": 0
}

The JSON file is passed on the command line using the --config-file option.

Upload to Fragalysis

To upload sprint results to Fragalysis a JSON config file may be supplied. For example,

fragalysis_config.json

{
        "run": true,
        "ligands_filename": "reliable-transformations-final-ligands.sdf",
        "fragalysis_sdf_filename": "compound-set_foldingathome-sprint-X.sdf",
        "ref_url": "https://url-link",
        "ref_mols": "x00000",
        "ref_pdb": "references.zip",
        "target_name": "protein-target",
        "submitter_name": "Folding@home",
        "submitter_email": "first.last@email.org",
        "submitter_institution": "institution-name",
        "method": "Sprint X",
        "upload_key": "upload-key",
        "new_upload": true 
}

The JSON file is passed on the command line using the --fragalysis-config option.

Description of the JSON parameters:

  • run: specify whether to run the Fragalysis upload. If set to false the results will not be uploaded (even if the JSON is supplied via the --fragalysis-config option).
  • ligands_filename: the name of the SDF file to upload to Fragalysis.
  • fragalysis_sdf_filename: the name to use for the SDF Fragalysis upload. This will be a copy of ligands_filename but must be in the form compound-set_name.sdf.
  • ref_url: the url to the post that describes the work e.g. for Sprint 5.
  • ref_mol: a comma separated list of the fragments that inspired the design of the new molecule (codes as they appear in fragalysis - e.g. x0104_0,x0692_0).
  • ref_pdb: 1) the name of the protein PDB zipped file to upload, this should be named references.zip (recommended) or 2) the code to the fragment pdb from fragalysis that should be used (e.g. x0692_0).
  • target_name: the name of the target protein.
  • submitter_name: the name of the submitter.
  • submitter_email: the email address of the submitter.
  • submitter_institution: the name of the institution that the submitter is associated with.
  • method: the method by which the results were obtained (e.g. Sprint 5).
  • upload_key: the unique upload key used to upload to Fragalysis.
  • new_upload: specifies whether to upload a new set (true) or to update an existing set (false).

For more information on the upload format see this forum post.

A unique upload_key is needed to push to Fragalysis, this can be requested here.

For more information on the entire upload process see this forum post.

Server-specific configuration

Paths to Folding@home project and data directories are passed on the command line. See usage examples above.

Development setup

Conda

This project uses conda to manage the environment. To set up a conda environment named fah-xchem with the required dependencies, create the conda environment as described above. To install fah-xchem as dev run:

pip install -e .

Running tests locally

pytest

Formatting

Code formatting with black is enforced via a CI check. To install black with conda, use

conda install black

Building documentation

cd docs
make html

Copyright

Copyright (c) 2020, Chodera Lab

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.3.