Tiara

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

The sequences are classified in two stages:

In the first stage, the sequences are classified to classes: archaea, bacteria, prokarya, eukarya, organelle and unknown.
In the second stage, the sequences labeled as organelle in the first stage are classified to either mitochondria, plastid or unknown.

For more information, please refer to our paper: Tiara: Deep learning-based classification system for eukaryotic sequences.

Supplementary data

Supplementary sequences

Requirements

Python >= 3.7, <=3.9
numpy, biopython, torch, skorch, tqdm, joblib, numba

Installation

More detailed installation instructions can be found here.

Using `pip`

Run pip install tiara, preferably in a fresh environment.

Using `conda`

Run conda install -c conda-forge tiara, preferably in a fresh environment.

We recommend to use mamba instead of conda (it's faster).

Unfortunately currently it does work only for python 3.7 and 3.8.

Using setup.py

Latest stable release

Download latest release from https://github.com/ibe-uw/tiara/releases.
Unzip/untar the archive.
Go to the directory.
Run python setup.py install.

Latest developer version

git clone https://github.com/ibe-uw/tiara.git
cd tiara
python setup.py install

Testing the installation

After the installation, run tiara-test to see if the installation was successful.

Usage

Basic usage:

tiara -i sample_input.fasta -o out.txt

The sequences in the fasta file should be at least 3000 bases long (default value). We do not recommend classify sequences that are shorter than 1000 base pairs.

It creates two files:

out.txt, a tab-separated file with header sequence id, first stage classification result, second stage classification result.
log_out.txt, containing model parameters and classification summary.

Advanced:

tiara -i sample_input.fasta -o out.txt --tf mit pla pro -t 4 -p 0.65 0.60 --probabilities

In addition to creating the files above, it creates, in the folder where tiara is run, three files containing sequences from sample_input.fasta classified as mitochondria, plastid and prokarya (--tf mit pla pro option).

The number of threads is set to 4 (-t 4) and probability cutoffs in the first and second stage of classification are set to 0.65 and 0.6, respectively.

The probabilities of belonging to individual classes are also written to out.txt, thanks to --probabilities option.

For more usage examples, go here.

Citation

Michał Karlicki, Stanisław Antonowicz, Anna Karnkowska, Tiara: deep learning-based classification system for eukaryotic sequences, Bioinformatics, Volume 38, Issue 2, 15 January 2022, Pages 344–350, https://doi.org/10.1093/bioinformatics/btab672

License

Tiara is released under an open-source MIT license

Version history:

1.0.3 – added pyproject.toml, updated dependencies to python<3.10 – unfortunately tiara doesn't work right now with python newer than 3.9 due to torch 1.7.0 compatibility issues. Added option to use gzipped fasta file as input (automatically identified by .gz suffix).
1.0.2 – added Python 3.9 compatibility, added an option to gzip the results. Added this README section.
1.0.0, 1.0.1 – initial releases.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data/Supplementary_sequences		data/Supplementary_sequences
docs		docs
tiara		tiara
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

ibe-uw/tiara

Folders and files

Latest commit

History

Repository files navigation

Requirements

Installation

Using pip

Using conda

Using setup.py

Latest stable release

Latest developer version

Testing the installation

Usage

Basic usage:

Advanced:

Citation

License

Version history:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Using `pip`

Using `conda`