Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.dbtype already exists error when clustering using profiles #844

Open
schmittel opened this issue May 6, 2024 · 1 comment
Open

.dbtype already exists error when clustering using profiles #844

schmittel opened this issue May 6, 2024 · 1 comment

Comments

@schmittel
Copy link

schmittel commented May 6, 2024

Hi,

I'm having difficulty clustering using profiles when following the instructions in the wiki. Specifically I'm referring to this section:

# extract consensus sequences from profiles
mmseqs profile2consensus profileDB1 profileDB1_consensus
# search with profiles against consensus sequences of seqDB1
mmseqs search profileDB1 profileDB1_consensus resultDB2 tmp --add-self-matches -a # Add your cluster criteria here
# cluster the results 
mmseqs clust profileDB1 resultDB2 profileDB1_clu

I can run mmseqs search without issue but when I run mmseqs clust I get the following error:

Create directory /final/db_cluster/low_1/Genus02938/Genus02938_DB
cluster /final/db_profile/low_1/Genus02938/Genus02938_DB /final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB /final/db_cluster/low_1/Genus02938/Genus02938_DB

MMseqs Version:                         15.6f452
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             4
k-mer length                            0
Target search mode                      0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max sequence length                     65535
Max results per query                   20
Split database                          0
Split mode                              2
Split memory limit                      0
Coverage threshold                      0.8
Coverage mode                           0
Compositional bias                      1
Compositional bias                      1
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Minimum diagonal score                  15
Selected taxa
Include identical seq. id.              false
Spaced k-mers                           1
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads                                 144
Compressed                              0
Verbosity                               3
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Max reject                              2147483647
Max accept                              2147483647
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Cluster mode                            0
Max connected component depth           1000
Similarity type                         2
Weight file name
Cluster Weight threshold                0.9
Single step clustering                  false
Cascaded clustering steps               3
Cluster reassign                        false
Remove temporary files                  false
Force restart with latest tmp           false
MPI runner
k-mers per sequence                     21
Scale k-mers per sequence               aa:0.000,nucl:0.200
Adjust k-mer length                     false
Shift hash                              67
Include only extendable                 false
Skip repeating k-mers                   false

Set cluster sensitivity to -s 6.000000
Set cluster mode SET COVER
Set cluster iterations to 3
/final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB.dbtype exists already!

Yes, /final/db_profile_vs_consensus/low_1/Genus02938/Genus02938_DB.dbtype already exists; it was created by mmseqs search. I'm not sure why mmseqs clust cares? Do you have any ideas - I can't figure this out. Many thanks!!

@schmittel schmittel changed the title 'could not copy file' error when clustering using profiles .dbtype already exists error when clustering using profiles May 6, 2024
@schmittel
Copy link
Author

I just learned that mmseqs cluster and mmseqs clust were different things, which solved the issue. Apologies for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant