Skip to content

UPHL-BioNGS/Grandeur

Repository files navigation

Grandeur

Named after the beautiful Grandeur Peak

Image Credit: ryancornia

Location: 40.707, -111.76, 8,299 ft (2,421 m) summit

More information about the trail leading up to this landmark can be found at https://utah.com/hiking/grandeur-peak

Grandeur is a Nextflow workflow developed by @erinyoung at the Utah Public Health Laborotory. "Grandeur" is intended to be a species agnostic sequencing analysis workflow to paired-end Illumina sequencing quality control and assurance (QC) and serotyping in a local public health laboratory.

"Grandeur" is meant to augment CDC's PHOENIX nextflow workflow, which is the official recommended usage. In principle, the contigs generated by PHOENIX undergo additional quality metric and serotyping steps, with a heavy emphasis on fastANI and AMRFinderPlus.

"Grandeur" can also be a standalone workflow that takes paired-end Illumina reads, removes adaptors with fastp and PHIX with bbduk, and creates contigs through de novo alignment of the reads with spades.

"Grandeur" is also a workflow of the staphb-toolkit

Dependencies

Usage

Default workflow that takes fastq files, runs them through QC/serotyping/etc, creates contig files

# using singularity
nextflow run UPHL-BioNGS/Grandeur -profile singularity --reads <path to reads>
# using docker
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas <path to fastas>

Commonly adjusted parameters

  • params.sample_sheet / --sample_sheet : specify sample sheet with sample id, forward reads in fastq.gz format, and reverse reads in fastq.gz format
  • params.outdir / --outdir : specify directly where results are saved (basic result patterns are granduer/analysis/sample*)
  • params.reads / --reads : specify directory with paired-end files
  • params.fastas / --fastas : specify directory with fasta files

Not-as-commonly adjusted parameters

  • params.kraken2_db / --kraken2_db : specify directory of kraken2 database
  • params.blast_db / --blast_db : specify directory of blast database (must accompany value for params.blast_db_type)
  • params.mash_db / --mash_db : specify reference file for mash
  • params.current_datasets / --current_datasets : set to false to avoid downloading genomes from NCBI genomes
  • params.iqtree2_outgroup / --iqtree2_outgroup : set outgroup for iqtree2

Wiki sections

The README got too long, so it's been moved to a wiki. There are several covered topics including:

Problems

Please submit any issues and problems to issues (or find us on SLACK).

Acknowledgements

Grandeur wouldn't be possible without the following tools:

  • amrfinderplus - identification of genes associated with antimicrobial resistence
  • bbduk - removal of PhiX
  • blastn - read identification with blobtools
  • blobtools - contamination
  • circulocov - coverage determination
  • datasets - downloads genomes from NCBI
  • drprg - TB AMR predictions
  • elgato - Legionella pneumophila Sequence Based Typing (SBT)
  • emmtyper - Group A Strep "emm" typing
  • fastani - species evaluator
  • fastp - cleaning reads
  • fastqc - fastq file QC
  • heatcluster - visualizes SNP matrix from SNP dists
  • iqtree2 - phylogenetic tree creation - used after core genome alignment
  • kleborate - Klebsiella serotyping
  • kraken2 - contamination
  • mash - species identifier
  • mashtree - tree based on mash distances (not impacted by size of core genome)
  • mlst - identification of MLST subtype
  • multiqc - summarizes QC efforts
  • mykrobe - Mycobacterium subtyping
  • panaroo - core genome alignment - optional (set with params.msa = true)
  • pbptyper - Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies
  • phytreeviz - basic tree visualization
  • plasmidfinder - MLST typing for plasmids
  • prokka - gene annotation - used for core genome alignment
    • will be replaced with bakta in a future release
  • quast - contig QC
  • seqsero2 - Salmonella serotyping
  • serotypefinder - E. coli serotyping
  • shigatyper - Shigella serotyping
  • snp-dists - SNP matrix - used after core genome aligment
  • spades - de novo alignment

The expected tools are split into multiple processes. Each process has its own wiki page that we encourage users to view.