Skip to content

vmikk/BatchBlaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BatchBlaster

Version Lifecycle License

BatchBlaster is a bioinformatics pipeline that employs BLAST (Basic Local Alignment Search Tool), an essential algorithm for comparing primary biological sequence information, to perform efficient and high-throughput taxonomic identification searches.

BatchBlaster is built using the Nextflow workflow management system, ensuring portability and reproducibility across multiple platforms. The pipeline is primarily designed for use on High Performance Computing (HPC) clusters, including the capability to submit tasks to the SLURM job scheduling system.

The name 'BatchBlaster' originates from its robust capability to submit and process BLAST tasks in batches, optimizing for speed and performance in large-scale sequence analysis tasks.

Features

  • High throughput BLAST search
  • Scalable and reproducible analysis with Nextflow
  • Multi-platform compatibility (Linux, MacOS, Windows)

Quick Start

  1. Install Nextflow

    curl -s https://get.nextflow.io | bash
  2. Run BatchBlaster

    nextflow run vmikk/BatchBlaster -r main --input 'path/to/your/input' ...

Parameters

  • --input : Path to the input file containing the sequences (Required)
  • --outdir : Path to the output directory (Default: ./results)
  • --blast_taxdb : Path to the BLAST database
  • ...

Output

The results will be saved in the specified output directory (./results, by default). Output includes:

  • BLAST search results in tabular format (m8 a.k.a. -outfmt 6)
  • A table with best BLAST hits reshaped into wide format
  • Summary report

Dependencies

Future Plans

  • Integration of additional sequence analysis methods (e.g., MMSeqs2, SINTAX, etc.)
  • Inclusion of Lowest Common Ancestor (LCA) estimation
  • Implementation of domain-specific threshold filtering for taxonomic annotation (e.g., for fungal sequences)
  • Adding advanced machine learning algorithms for more accurate taxonomic classification (e.g., deep learning models that have been trained on the UNITE database)
  • Implementation of a hybrid annotation approach (e.g., integration of classification results from various methods to enhance accuracy and reliability of taxonomic identification)

We are excited to share these enhancements in our forthcoming updates, so stay tuned!

License

This project is licensed under the terms of the Apache-2.0 license.


Please feel free to submit issues and pull requests, your contributions are welcome!

About

Nextflow-based BLAST pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published