Trivial case of clustering one sequence is too slow #835

korneel-emweb · 2024-04-23T09:42:40Z

Expected Behavior

I am repeatedly running mmseqs to cluster databases with an unknown number of sequences. If there is only one sequence, there is of course one cluster. I expected this to go fast, but it actually takes about 7 seconds to compute. From what I can tell most of the time is spent in the prefiltering.

The setup is that I have a large database and want to cluster the sequences per species (subsets of the database). I tried some workarounds by prefiltering and aligning with the entire database and then trying the clustering on a subdatabase, but mmseqs doesn't like it. It complains that the database is not same size as the alignment database. I also tried createsubdb on the alignment database, but no luck ...

Is there a workflow that can help me here?

Current Behavior

Steps to Reproduce (for bugs)

MMseqs Output (for bugs)

Context

Your Environment

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trivial case of clustering one sequence is too slow #835

Trivial case of clustering one sequence is too slow #835

korneel-emweb commented Apr 23, 2024

Trivial case of clustering one sequence is too slow #835

Trivial case of clustering one sequence is too slow #835

Comments

korneel-emweb commented Apr 23, 2024

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

MMseqs Output (for bugs)

Context

Your Environment