Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline for getting taxonomy for clusters #815

Open
alopgar opened this issue Mar 1, 2024 · 0 comments
Open

Pipeline for getting taxonomy for clusters #815

alopgar opened this issue Mar 1, 2024 · 0 comments

Comments

@alopgar
Copy link

alopgar commented Mar 1, 2024

Hi, I have been using MMseqs2 to obtain clusters of multiple sequence files and then obtain each sequence's taxonomy. I followed this pipeline:

mmseqs easy-cluster ${rawfas[@]} newcluster tmp --min-seq-id 0.3 -c 0.5 --cov-mode 1 --cluster-mode 2 -e 0.001 -s 6
mmseqs createdb ${rawfas[@]} queryDB_all
mmseqs taxonomy queryDB $TXDB clusterTax tmp --lca-mode 4 --split-memory-limit 60G \
     --lca-ranks superkingdom,phylum,class,order,family,genus
mmseqs createtsv queryDB clusterTax ../clusterTax.tsv

The output of these is a clusterRes_cluster.tsv file including the representative sequences and the cluster members, and a clusterTax file with the taxonomy for each sequence.

My question is, is there any MMseqs2 implementation to obtain the common taxonomy for each cluster, like an LCA algorithm applied to all the sequences belonging to each cluster, or something similar? Or another software that allows me to do that?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant