Is there a way to reduce memory requirement? #258

aelbehery · 2023-03-20T20:17:48Z

I would like to thank you so much for this wonderful tool. I really like it so much. However, I would appreciate if there is a way to reduce memory requirement. I used Uniref100 as a custom database. I tested it briefly on a system with 256G of RAM; it successully ran, but required roughly 230G of RAM. So, when I submitted it to compute nodes, which have 128G each, it of course failed.

mmseqs2 reduces memory requirement by splitting the database, but it's much slower and I'd prefer if I can find a way to still use kaiju in my case with only 128G. Do you have some tips or suggestions?

Many thanks!

pmenzel · 2023-03-20T20:28:25Z

How about splitting the database into two parts and run kaiju once for each parts? Then merge both output files using kaiju-mergeOutputs with option -s, so that for each read the database match with best score is kept, see the README, also about sorting the files before merging them.

aelbehery · 2023-03-20T21:12:39Z

I think it's worth trying. Thank you so much!

…

On Mon, Mar 20, 2023, 10:28 PM Peter Menzel ***@***.***> wrote: How about splitting the database into two parts and run kaiju once for each parts? Then merge both output files using kaiju-mergeOutputs with option -s, so that for each read the database match with best score is kept, see the README, also about sorting the files before merging them. — Reply to this email directly, view it on GitHub <#258 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACN4I2JL4GZETFY6RAGHCEDW5C4XJANCNFSM6AAAAAAWBSE6FI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

aelbehery · 2023-03-22T12:21:45Z

I tried it and it worked! Thank you so much! There are minor differences between the two methods though, but I can live with that.

I suggest to include this strategy as a new feature of kaiju, where kaiju automatically detects the available memory and splits the database into a number of chunks suitable for the detected memory, so that everything runs automatically behind the scenes. This feature will not only allow using kaiju for systems with less memory, but also help many users who may not be able to do these steps manually.

emyr666 · 2023-06-27T14:39:35Z

I am not sure what performance impact this would have but using LMDB instead of having the whole thing in RAM may not have that much of an impact. Especially with a fast scratch disk system: http://www.lmdb.tech/doc/
Any chance you could consider this as an option? What you lose in having to do some disk access may be paid back by not having to use BWT.

aelbehery changed the title ~~Is there a way to reduc memory requirement?~~ Is there a way to reduce memory requirement? Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to reduce memory requirement? #258

Is there a way to reduce memory requirement? #258

aelbehery commented Mar 20, 2023

pmenzel commented Mar 20, 2023

aelbehery commented Mar 20, 2023 via email

aelbehery commented Mar 22, 2023

emyr666 commented Jun 27, 2023 •

edited

Is there a way to reduce memory requirement? #258

Is there a way to reduce memory requirement? #258

Comments

aelbehery commented Mar 20, 2023

pmenzel commented Mar 20, 2023

aelbehery commented Mar 20, 2023 via email

aelbehery commented Mar 22, 2023

emyr666 commented Jun 27, 2023 • edited

emyr666 commented Jun 27, 2023 •

edited