Does input FASTA file have to be aligned? #841

laurien-0 · 2024-05-02T13:26:48Z

I have run just the following commands

mmseqs createdb x_protseqs.fasta x_db
mmseqs cluster x_db x_clust tmp --min-seq-id 0.9
mmseqs createtsv x_db x_db x_clust x_clust.tsv

My input x_protseqs.fasta is not aligned, and I got some slightly weird results from it
Namely that when I aligned all the cluster representatives with an online MSA tool and plotted the PIM, I got some 99%s in there.

Is this just a quirk of the different alignment algorithms or should I be pre-aligning my data?

Thank you

milot-mirdita · 2024-05-02T14:27:59Z

The clustering does NOT take aligned input. Gaps would be turned to X characters and result in very odd alignments.

I am not sure I understand your issue with the weird alignments.

laurien-0 · 2024-05-02T14:30:11Z

Thank you, that is useful to know.
IE I clustered at 70% and at 90% but with both - when I downloaded the representative sequences from each cluster and ran these in a MSA tool you would expect to see maximum roughly 70% and 90% pairwise comparisons right? The PIM is the percentage identity matrix. Instead I got values of up to 99% in both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does input FASTA file have to be aligned? #841

Does input FASTA file have to be aligned? #841

laurien-0 commented May 2, 2024

milot-mirdita commented May 2, 2024

laurien-0 commented May 2, 2024

Does input FASTA file have to be aligned? #841

Does input FASTA file have to be aligned? #841

Comments

laurien-0 commented May 2, 2024

milot-mirdita commented May 2, 2024

laurien-0 commented May 2, 2024