unable to custom DB #255

juejun14 · 2023-02-08T09:53:56Z

Hello,
thank you for creating this amazing tool!
I'm trying to custom a uniref90 DB from the uniref90.fsa file. I have made the file has a format like

1_1572043
MEEITQIKKRLSQTVRLEGKEDLLSKKDSITNLKTEEHVSVKKMVISEPKPEKKEDIQLK

and then I run the command with my protein faa file:

kaiju-mkbwt -n 32 -a ACDEFGHIKLMNPQRSTVWY -o uniref90 uniref90.fsa
kaiju-mkfmi proteins

But each time the process will be killed and I can't find the reason, here is the msg on terminal:

# infilename= databases/uniref90/uniref90.fsa
# outfilename= databases/uniref90/uniref90
# Alphabet= ACDEFGHIKLMNPQRSTVWY
# nThreads= 32
# length= 0.000000
# checkpoint= 5
# caseSens=OFF
# revComp=OFF
# term= *
# revsort=OFF
# help=OFF
Sequences read time = 397.070000s
SLEN 39118962811
NSEQ 113461890
ALPH *ACDEFGHIKLMNPQRSTVWY
/var/spool/slurm/slurmd/job31805075/slurm_script: line 24: 20773 Killed                  kaiju-mkbwt -n 32 -a ACDEFGHIKLMNPQRSTVWY -o databases/uniref90/uniref90 databases/uniref90/uniref90.fsa

I have tested it with a smaller size file which has only 50000 lines of my faa file. It can run successfully until the end.
I can not find out where is the problem, can anyone help me pls?

Thanks,
juejun

pmenzel · 2023-02-08T09:59:26Z

This is typically due to low memory (RAM) available. Probably you need to ask slurm for more memory..

See the table here for typical memory usage of kaiju-makedb for various database sizes: https://github.com/bioinformatics-centre/kaiju#creating-the-reference-database-and-index

juejun14 · 2023-02-08T15:30:16Z

This is typically due to low memory (RAM) available. Probably you need to ask slurm for more memory..

See the table here for typical memory usage of kaiju-makedb for various database sizes: https://github.com/bioinformatics-centre/kaiju#creating-the-reference-database-and-index

ok, i will try it, thank you very much!

Is there any way to estimate the necessary RAM from the size of faa file? for example, the size of uniref90.fsa is 40Go.

have a nice day.

pmenzel · 2023-02-08T19:08:38Z

Hm that's hard to say.. Maybe set the number of sequences in your fasta file in relation to the number of seqs in the nr database from the table and get a memory estimate from that..

juejun14 · 2023-02-09T13:11:22Z

Hm that's hard to say.. Maybe set the number of sequences in your fasta file in relation to the number of seqs in the nr database from the table and get a memory estimate from that..

ok, thank you !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to custom DB #255

unable to custom DB #255

juejun14 commented Feb 8, 2023

pmenzel commented Feb 8, 2023

juejun14 commented Feb 8, 2023

pmenzel commented Feb 8, 2023

juejun14 commented Feb 9, 2023

unable to custom DB #255

unable to custom DB #255

Comments

juejun14 commented Feb 8, 2023

pmenzel commented Feb 8, 2023

juejun14 commented Feb 8, 2023

pmenzel commented Feb 8, 2023

juejun14 commented Feb 9, 2023