GitHub - XPRESSyourself/XPRESSpipe: An alignment and analysis pipeline for Ribosome Profiling and RNA-seq data

An alignment and analysis pipeline for RNAseq data

Please refer to the documentation for more in depth details.

Citation:

Berg JA, et. al. (2020). XPRESSyourself: Enhancing, standardizing, and
automating ribosome profiling computational analyses yields improved insight
into data. PLoS Comp Biol. doi: https://doi.org/10.1371/journal.pcbi.1007625

Installation:

Installing from source

The following is a short tutorial showing you how to install XPRESSpipe:

NOTE: Previous versions utilized the pip install . command to install. Users of >= v0.6.3 should instead use bash install.sh

Make sure you let Anaconda set up the PATH info for you.
If the help menu is not displayed when testing, try adding the path where you installed XPRESSpipe to the system PATH

$ echo 'export PATH=$PATH:/path/to/xpresspipe' >> ~/.bash_profile

If you do not have a file names ~/.bash_profile, try looking for one called ~/.profile
The commands used in the video above are summarized here:

$ curl -L -O https://github.com/XPRESSyourself/XPRESSpipe/archive/refs/tags/v0.6.3.zip
$ unzip XPRESSpipe-v0.6.3.zip
$ cd XPRESSpipe-v0.6.3/
$ conda install -c conda-forge mamba
$ mamba env create -f requirements.yml
$ conda activate xpresspipe
$ bash install.sh
$ xpresspipe -h
$ xpresspipe test

Be sure to specify the correct release version in the first URL

Using XPRESSpipe on a supercomputer

The conda environment, xpresspipe, will need to be activated
For example, if using a SLURM job scheduler, you should include the following after the #SBATCH lines and before any calls to XPRESSpipe in the slurm script, as below:

#!/bin/bash
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH ...

source $(conda info --base)/etc/profile.d/conda.sh
conda activate xpresspipe

... rest of the script

QuickStart:

Reference building
Running XPRESSpipe on sequence data
You can also use the XPRESSpipe command builder and executor for reference curation or running the pipeline by executing the following:

$ xpresspipe build

Important Notes:

Basic Starting Input

input directory with raw sequence data
- Sequence data files should be FASTQ format and end in .fastq or .fq and can be .zip or .gz compressed
An empty output directory
A reference directory (see documentation for curateReference for more details)

Naming Conventions

In order for ordered output after alignment (except for generation of a raw counts table), recommended file naming conventions should be followed.

Download your raw sequence data and place in a folder -- this folder should contain all the sequence data and nothing else.
Make sure files follow a pattern naming scheme. For example, if you had 3 genetic backgrounds of ribosome profiling data, the naming scheme would go as follows:

ExperimentName_BackgroundA_FP.fastq(.qz)
ExperimentName_BackgroundA_RNA.fastq(.qz)
ExperimentName_BackgroundB_FP.fastq(.qz)
ExperimentName_BackgroundB_RNA.fastq(.qz)
ExperimentName_BackgroundC_FP.fastq(.qz)
ExperimentName_BackgroundC_RNA.fastq(.qz)

If the sample names are replicates, their sample number needs to be indicated.
If you want the final count table to be in a particular order and the samples ordered that way are not alphabetically, append a letter in front of the sample name to force this ordering.

ExperimentName_a_WT.fastq(.qz)
ExperimentName_a_WT.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)

If you have replicates:

ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)

Running a test dataset:

We can run a test dataset as in the associated manuscript by downloading the FASTQ files from GSE65778 using the SRAtoolkit.
We can curate the reference like so:

$ xpresspipe curateReference -o /path/to/reference -f /path/to/reference/genome_fastas -g /path/to/reference/transcripts.gtf -p -t --sjdbOverhang 49

And we can process the dataset like so:

xpresspipe riboseq -i /path/to/input -o /path/to/output -r /path/to/reference/ --gtf /path/to/reference//transcripts_CT.gtf -e isrib_test_study -a CTGTAGGCACCATCAAT --sjdbOverhang 49

The above steps will be very computationally intensive, so we recommend running this on a supercomputing cluster
Scripts used to analyze this data can be found here and here and here
Alternatively, smaller test datasets can be found within the XPRESSpipe tests folder and an outline of commands to run can be found here

Updates

Information on updates to the software can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 930 Commits
.github		.github
docs		docs
fastp_lite		fastp_lite
install		install
tests		tests
xpresspipe		xpresspipe
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
install.sh		install.sh
requirements.yml		requirements.yml
setup.py		setup.py

License

XPRESSyourself/XPRESSpipe

Folders and files

Latest commit

History

Repository files navigation

An alignment and analysis pipeline for RNAseq data

Citation:

Installation:

Installing from source

Using XPRESSpipe on a supercomputer

QuickStart:

Important Notes:

Basic Starting Input

Naming Conventions

Running a test dataset:

Updates

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages