Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A very long runnig time for krakenuniq #158

Open
Sheerlik opened this issue Dec 11, 2023 · 2 comments
Open

A very long runnig time for krakenuniq #158

Sheerlik opened this issue Dec 11, 2023 · 2 comments

Comments

@Sheerlik
Copy link

Hello,

We want to run krakenuniq for datasets containing 100-200 samples of about 25-30M reads each.
As of now, we are not able to successfully complete one run, even when trying to use only one of the paired end reads or even when taking only 100, 000 reads from a sample.
It seems as of it hardly uses any CPUs.

An example run-
nohup ./krakenuniq --db DBDIR-microbialDB --threads 40 --report-file REPORT_FILE_smallfastq_100000 ../kneaddata_output/k neaddata_output_3/SRR233_1_kneaddata.trimmed.1.100000.fastq --preload > smallfastqfile-100000.out 2>&1 &

We are using krakenuniq 1.0.4 version.
The database is NCBI nt.

Thank you!
Sheerli

@salzberg
Copy link
Collaborator

you don't have enough RAM - I don't know how you built a KrakenUniq database for NCBI nt, but that's really enormous. (The Kraken2 DB is smaller but still very large.) If you downloaded the DB from our index page, then it's not NT but it still needs over 400 GB of RAM.
Try running without "--preload" which tells Kraken to load the entire DB before processing any reads (no matter how small the file). That sometimes fixes it quickly. Alternatively, use "--preload-size N" where N is less than half of the RAM you have available. That will do it too.

@Sheerlik
Copy link
Author

Sheerlik commented Dec 12, 2023

Thank you for your quick response Steven!

Our machine uses 190 GBRAM, we use Amazon EC2 c5d.24xlarge
We are using the indexed DB that you have published.
We tried a preload of 90 GBRAM as you suggested, and 60 threads. The running time has substantially gone down to about 1 hour and 20 minutes! (for 1 sample)

We would like to know if there is another way to reduce the running time of one metagenomic sample as we have about 200 samples.

Additionally, the "--preload-size N" function you suggested isn't available in the help command.
Is there another manual besides what is in the GitHub?

Thank you!
Sheerli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants