Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Prefilter died #826

Open
goldenmole1 opened this issue Mar 19, 2024 · 1 comment
Open

Error: Prefilter died #826

goldenmole1 opened this issue Mar 19, 2024 · 1 comment

Comments

@goldenmole1
Copy link

goldenmole1 commented Mar 19, 2024

Expected Behavior

I ran this script with mmseqs2 part shown below and had a prefilter died error. What should I do?
#!/bin/bash
## specify allocation - we want normal since we don't want to use the whole node for nothing
SBATCH -A grp-org-sc
SBATCH -q normal
## specify number of nodes
SBATCH -N 2
## specify number of procs/CPUS
SBATCH -c 8
## specify runtime
SBATCH -t 72:00:00
## specify job name
SBATCH -J seqdetect
##Memory per cpu
SBATCH --mem-per-cpu=512G

export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs
[Initial part of the script for pre-processing abbreviated here]
### MMseqs2

#conda activate /groups/science/homes/username/.micromamba/envs/mmseqs2
export PATH=$PATH:/groups/science/homes/username/anaconda3/bin/mmseqs
mkdir mmseqs_target_seq/
mkdir mmseqs_target_seq/${sample}
mkdir phrog_output/
cp previousstep_output/${sample}/${sample}_summary/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa
mmseqs createdb mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.faa mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq

### MMseqs2/Phrogs
mmseqs search phrogs_mmseqs_db/phrogs_profile_db
mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq
mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs
mmseqs_target_seq/${sample}/tmp -s 7

mmseqs createtsv phrogs_mmseqs_db/phrogs_profile_db
mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins.target_seq
mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs
mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv --full-header

cp mmseqs_target_seq/${sample}/${sample}_targetofinterest_proteins_mmseqs.tsv mmseqs_target_seq
echo "file: mmseqs_target_seq/${sample}_targetofinterest_proteins_mmseqs.tsv"

Current Behavior

[Previous output omitted here]
Create directory mmseqs_target_seq/[bacteria_of_interest]/tmp
search phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/tmp -s 7

MMseqs Version: 14.7e284
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 64
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 7
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

prefilter phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 64 --compressed 0 -v 3 -s 7.0

Query database size: 38880 type: Profile
Estimated memory consumption: 488M
Target database size: 125 type: Aminoacid
Index table k-mer threshold: 0 at k-mer size 6
Index table: counting k-mers
[=================================================================] 125 0s 5ms
Index table: Masked residues: 124
Index table: fill
[=================================================================] 125 0s 6ms
Index statistics
Entries: 25103
DB size: 488 MB
Avg k-mer size: 0.000392
Top 10 k-mers
ALGLAA 2
TTGTAA 2
AAARKA 2
KASRKA 2
TEEALA 2
EDLLRA 2
INGNED 2
ASARED 2
GKHHRD 2
AELKAE 2
Time for index table init: 0h 0m 0s 511ms
Process prefiltering step 1 of 1

k-mer similarity threshold: 91
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 38880
Target db start 1 to 125
[=mmseqs_target_seq/[bacteria_of_interest]/tmp/15822818178659183495/blastp.sh: line 99: 1649148 Killed $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMP_PATH/pref_$STEP" $PREFILTER_PAR -s "$SENS"
Error: Prefilter died
createtsv phrogs_mmseqs_db/phrogs_profile_db mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins.target_seq mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv --full-header

MMseqs Version: 14.7e284
First sequence as representative false
Target column 1
Add full header true
Sequence source 0
Database output false
Threads 64
Compressed 0
Verbosity 3

No datafile could be found for mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs!
cp: cannot stat 'mmseqs_target_seq/[bacteria_of_interest]/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv': No such file or directory
file: mmseqs_target_seq/[bacteria_of_interest]_targetofinterest_proteins_mmseqs.tsv
sample: [bacteria_of_interest]
[bacteria_of_interest]
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4226926.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:
    MMseq version: 13.45111
    CPU: 2x AMD 7543 (64 cores total)
    RAM: 512 GB
    Local Disk: 7 TB SSD
    Network: 100 GBit Infiniband
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 32
    On-line CPU(s) list: 0-31
    Thread(s) per core: 1
    Core(s) per socket: 32
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: AuthenticAMD
    CPU family: 23
    Model: 49
    Model name: AMD EPYC 7502P 32-Core Processor
    Stepping: 0
    CPU MHz: 2500.000
    CPU max MHz: 2500.0000
    CPU min MHz: 1500.0000
    BogoMIPS: 5000.22
    Virtualization: AMD-V
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 512K
    L3 cache: 16384K
    NUMA node0 CPU(s): 0-31
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
@milot-mirdita
Copy link
Member

Please try to reverse the search direction (sequences vs profiles, not profiles vs sequences).

It looks like the small number of queries is causing some weird issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants