Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to custom DB #255

Open
juejun14 opened this issue Feb 8, 2023 · 4 comments
Open

unable to custom DB #255

juejun14 opened this issue Feb 8, 2023 · 4 comments

Comments

@juejun14
Copy link

juejun14 commented Feb 8, 2023

Hello,
thank you for creating this amazing tool!
I'm trying to custom a uniref90 DB from the uniref90.fsa file. I have made the file has a format like

1_1572043
MEEITQIKKRLSQTVRLEGKEDLLSKKDSITNLKTEEHVSVKKMVISEPKPEKKEDIQLK

and then I run the command with my protein faa file:

kaiju-mkbwt -n 32 -a ACDEFGHIKLMNPQRSTVWY -o uniref90 uniref90.fsa
kaiju-mkfmi proteins

But each time the process will be killed and I can't find the reason, here is the msg on terminal:

# infilename= databases/uniref90/uniref90.fsa
# outfilename= databases/uniref90/uniref90
# Alphabet= ACDEFGHIKLMNPQRSTVWY
# nThreads= 32
# length= 0.000000
# checkpoint= 5
# caseSens=OFF
# revComp=OFF
# term= *
# revsort=OFF
# help=OFF
Sequences read time = 397.070000s
SLEN 39118962811
NSEQ 113461890
ALPH *ACDEFGHIKLMNPQRSTVWY
/var/spool/slurm/slurmd/job31805075/slurm_script: line 24: 20773 Killed                  kaiju-mkbwt -n 32 -a ACDEFGHIKLMNPQRSTVWY -o databases/uniref90/uniref90 databases/uniref90/uniref90.fsa

I have tested it with a smaller size file which has only 50000 lines of my faa file. It can run successfully until the end.
I can not find out where is the problem, can anyone help me pls?

Thanks,
juejun

@pmenzel
Copy link
Member

pmenzel commented Feb 8, 2023

This is typically due to low memory (RAM) available. Probably you need to ask slurm for more memory..

See the table here for typical memory usage of kaiju-makedb for various database sizes: https://github.com/bioinformatics-centre/kaiju#creating-the-reference-database-and-index

@juejun14
Copy link
Author

juejun14 commented Feb 8, 2023

This is typically due to low memory (RAM) available. Probably you need to ask slurm for more memory..

See the table here for typical memory usage of kaiju-makedb for various database sizes: https://github.com/bioinformatics-centre/kaiju#creating-the-reference-database-and-index

ok, i will try it, thank you very much!

Is there any way to estimate the necessary RAM from the size of faa file? for example, the size of uniref90.fsa is 40Go.

have a nice day.

@pmenzel
Copy link
Member

pmenzel commented Feb 8, 2023

Hm that's hard to say.. Maybe set the number of sequences in your fasta file in relation to the number of seqs in the nr database from the table and get a memory estimate from that..

@juejun14
Copy link
Author

juejun14 commented Feb 9, 2023

Hm that's hard to say.. Maybe set the number of sequences in your fasta file in relation to the number of seqs in the nr database from the table and get a memory estimate from that..

ok, thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants