Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to reduce memory requirement? #258

Open
aelbehery opened this issue Mar 20, 2023 · 4 comments
Open

Is there a way to reduce memory requirement? #258

aelbehery opened this issue Mar 20, 2023 · 4 comments

Comments

@aelbehery
Copy link

I would like to thank you so much for this wonderful tool. I really like it so much. However, I would appreciate if there is a way to reduce memory requirement. I used Uniref100 as a custom database. I tested it briefly on a system with 256G of RAM; it successully ran, but required roughly 230G of RAM. So, when I submitted it to compute nodes, which have 128G each, it of course failed.

mmseqs2 reduces memory requirement by splitting the database, but it's much slower and I'd prefer if I can find a way to still use kaiju in my case with only 128G. Do you have some tips or suggestions?

Many thanks!

@pmenzel
Copy link
Member

pmenzel commented Mar 20, 2023

How about splitting the database into two parts and run kaiju once for each parts? Then merge both output files using kaiju-mergeOutputs with option -s, so that for each read the database match with best score is kept, see the README, also about sorting the files before merging them.

@aelbehery
Copy link
Author

aelbehery commented Mar 20, 2023 via email

@aelbehery aelbehery changed the title Is there a way to reduc memory requirement? Is there a way to reduce memory requirement? Mar 22, 2023
@aelbehery
Copy link
Author

I tried it and it worked! Thank you so much! There are minor differences between the two methods though, but I can live with that.

I suggest to include this strategy as a new feature of kaiju, where kaiju automatically detects the available memory and splits the database into a number of chunks suitable for the detected memory, so that everything runs automatically behind the scenes. This feature will not only allow using kaiju for systems with less memory, but also help many users who may not be able to do these steps manually.

@emyr666
Copy link

emyr666 commented Jun 27, 2023

I am not sure what performance impact this would have but using LMDB instead of having the whole thing in RAM may not have that much of an impact. Especially with a fast scratch disk system: http://www.lmdb.tech/doc/
Any chance you could consider this as an option? What you lose in having to do some disk access may be paid back by not having to use BWT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants