You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Searching proteins against a database with similar and exact proteins (from bacterial refseq proteome) should return hits with similar and exact matches.
Current Behavior
Running mmseqs search returns few to no hits. However easy-search does output way more hits (an expected amount).
Steps to Reproduce (for bugs)
For mmseqs search:
create query and target databases with query_fasta and target_fasta
mmseqs search at 0.95 min-seq-id and coverage with coverage mode 0
mmseqs convertalis
For mmseqs easy-search:
Ran easy-search directly with query and target fastas, same search parameters
I am searching a fasta of known bacterial proteins against the bacterial refseq WP proteome. I noticed that only half of my original virulence proteins (out of ~8000) had hits against refseq. Refseq proteome is large so I found a minimal example where there is an exact match (as well as similar according to easy-search) between the target and query databases that mmseqs search doesn't seem to find, but easy-search does.
I can provide the larger fastas if more examples to replicate are necessary.
There are 2 fastas in the attached .zip file each containing 4 proteins, one of those is an exact match (same WP_number) and 2 proteins (WP_000633131.1 and WP_000633136.1) are very similar to the protein with the exact match.
fastas_to_search.zip
query fasta = query_subset.faa
target_fasta = 406_subset.faa
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 15.6f452
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): conda
The text was updated successfully, but these errors were encountered:
Adding -a or --alignment-mode 3 fixes the issue. easy-search better detects when exact sequence identity is required, search does the sequence identity estimation by default and try to detect it.
Expected Behavior
Searching proteins against a database with similar and exact proteins (from bacterial refseq proteome) should return hits with similar and exact matches.
Current Behavior
Running mmseqs search returns few to no hits. However easy-search does output way more hits (an expected amount).
Steps to Reproduce (for bugs)
For mmseqs search:
For mmseqs easy-search:
MMseqs Output (for bugs)
MMseqs search output: https://gist.github.com/mcn3159/9a5ed05852e2e83b8656d25f0333a8f3
Context
I am searching a fasta of known bacterial proteins against the bacterial refseq WP proteome. I noticed that only half of my original virulence proteins (out of ~8000) had hits against refseq. Refseq proteome is large so I found a minimal example where there is an exact match (as well as similar according to easy-search) between the target and query databases that mmseqs search doesn't seem to find, but easy-search does.
I can provide the larger fastas if more examples to replicate are necessary.
There are 2 fastas in the attached .zip file each containing 4 proteins, one of those is an exact match (same WP_number) and 2 proteins (WP_000633131.1 and WP_000633136.1) are very similar to the protein with the exact match.
fastas_to_search.zip
query fasta = query_subset.faa
target_fasta = 406_subset.faa
Your Environment
Include as many relevant details about the environment you experienced the bug in.
The text was updated successfully, but these errors were encountered: