Skip to content

Snakemake workflow to dump TCGA reads, map to target genomes, and summarize alignments

Notifications You must be signed in to change notification settings

louiejtaylor/fang-tcga

Repository files navigation

fang-tcga

Snakemake workflow to map TCGA reads to target genomes and summarize alignments

Warning: this project is under active development and may break in unexpected ways.

Install

git clone https://github.com/louiejtaylor/fang-tcga

Edit the config file to point to your desired output dir and the list of files to process (and/or download). Making a new/copy of your config file for each experiment is recommended for reproducibility.

Requirements

Requires Snakemake and Conda to run. All other dependencies are handled at runtime by Conda.

Run

Note: if the data you are trying to access are controlled (e.g., not public), you will need an access token from the Genomic Data Commons. There are detailed instructions for obtaining one here.

snakemake --configfile config.yml --use-conda

Before your first run, map your case IDs to file IDs using the generate_sample_list rule:

snakemake --configfile config.yml --use-conda generate_sample_list

If you'd like only certain outputs, specify a target rule like so:

snakemake --configfile config.yml --use-conda all_preprocess

Available target rules include: all_download, all_preprocess, all_align, all_summary

About

fang-tcga is a backronym which stands for finding additional non-human genomes in TCGA. It also uses code directly, as a submodule, or modified from, hisss, a collab with ArwaAbbas.

About

Snakemake workflow to dump TCGA reads, map to target genomes, and summarize alignments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages