JASEN

Json producing Assembly driven microbial Sequence analysis pipeline to support Epitypification and Normalize classification decisions

Setup

git clone --recurse-submodules --single-branch --branch master https://github.com/genomic-medicine-sweden/JASEN.git
Edit JASEN/nextflow.config
Optionally run: bash JASEN/container/safety_exports.sh USER PREFIX

Singularity implementation

Image creation

Install Singularity (through conda or whatever)
cd JASEN/container && bash build_container.sh

Image execution

singularity exec -B JASEN_INSTALL_DIR:/external -B WORKDIR:/out IMAGE nextflow -C /external/nextflow.config run /JASEN/main.nf -profile local,singularity

Conda implementation

Install Conda ( https://www.anaconda.com/distribution )
Install nextFlow ( curl -s https://get.nextflow.io | bash )
bash JASEN/setup.sh
nextflow run JASEN/main.nf -profile -local,conda

nextflow pipeline for typing and marker detection of bacteria

Purpose

The pipeline is aimed at producing data useful for epidemiological and surveillance purposes. In v1 the pipeline is only tested using MRSA, but it should work well with any bacteria having a good cgMLST scheme.

Installation

Clone the pipeline repository with nextflow-modules submodule.

git clone --recursive git@github.com:Clinical-Genomics-Lund/nextflow-modules.git

Install the database components required by the pipeline.

How to use

Input files are defined in a csv file with the following format. All samples need to be of the same "type", meaning that they can be analyzed with the same analysis profile, defined in the nextflow config.

id,read1,read2
p1,ALL504A259_122-78386_S1_R1_001.fastq.gz,ALL504A259_122-78386_S1_R2_001.fastq.gz
p2,ALL504A260_122-78386_S2_R1_001.fastq.gz,ALL504A260_122-78386_S2_R2_001.fastq.gz
p3,ALL504A261_122-78386_S3_R1_001.fastq.gz,ALL504A261_122-78386_S3_R2_001.fastq.gz
p4,ALL504A262_122-78386_S4_R1_001.fastq.gz,ALL504A262_122-78386_S4_R2_001.fastq.gz
p5,ALL504A263_122-78386_S5_R1_001.fastq.gz,ALL504A263_122-78386_S5_R2_001.fastq.gz

Start a new analsis with samples defined in test.csv using the staphylococcus_aureus profile.

nextflow run -entry bacterial_default -profile staphylococcus_aureus -config configs/nextflow.trannel.config --csv=test.csv

Components

QC

Species detection is performed using Kraken2 together with Bracken. The database used is a standard Kraken database built with

kraken2-build --standard --db $DBNAME

Low levels of Intra-species contamination or erronous mapping is removed using bwa and filtering away the heterozygous mapped bases.

Genome coverage is estimated by mapping with bwa mem and using a bed file containing the cgMLST loci.

A value on the evenness of coverage is calculated as an interquartile range.

Epidemiological typing

For de novo asspembly SPAdes is used. QUAST is used for extraxting QC data from the assembly.

The cgMLST reference scheme used, is branched off cgmlst.net At the moment this fork is not synced back with new allele numbers. For extracting alleles chewBBACA is used. Number of missing loci is calculated and used as a QC parameter.

Traditional 7-locus MLST is calculated using mlst.

Virulence and resistance markers

ARIBA is used as the tool to detect genetic markes. The database for virulence markes is VFDB.

Report and visualisation

The QC data is aggregated in a web service CDM (repo coming) and the cgMLST is visualized using a web service cgviz that is combined with graptetree for manipulating trees (repo coming).

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github		.github
assets		assets
bin		bin
configs		configs
container		container
nextflow-modules @ 158fe40		nextflow-modules @ 158fe40
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
deploy_hopper.sh		deploy_hopper.sh
deploy_trannel.sh		deploy_trannel.sh
environment.yaml		environment.yaml
main.nf		main.nf
nextflow.config		nextflow.config
setup.sh		setup.sh
test.csv		test.csv

License

Genomic-Medicine-Linkoping/JASEN

Folders and files

Latest commit

History

Repository files navigation

JASEN

Setup

Singularity implementation

Image creation

Image execution

Conda implementation

nextflow pipeline for typing and marker detection of bacteria

Purpose

Installation

How to use

Components

QC

Epidemiological typing

Virulence and resistance markers

Report and visualisation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages