Write reads to disk immediately instead of caching them in RAM #210

sebschmi · 2021-08-03T10:51:30Z

First of all, thank you for making ISS. I find it very fast and easy to use, especially because it ships with error models.

When trying it out I noted that it uses a lot of RAM, which seemed odd for a read simulator, especially since it slowly eats RAM over time. However I think I found the reason and a fix for that.

When generating reads, ISS first stores all reads in a python list in RAM. Only after generating all reads, it writes them to disk.

However, it would be much more memory efficient to write them to disk immediately after generation. So this is what I did. I moved the read generation code into a generator function reads_generator which I pass to to_fastq.

As a result, the memory usage is now small and stays constant during generation.

Tj-Idowu

This worked great. Thank you

Write reads to disk immediately instead of caching them in RAM.

9c1a9a8

Tj-Idowu reviewed Jul 3, 2022

View reviewed changes

HadrienG mentioned this pull request Oct 2, 2023

Additional Amplicon sequencing features #242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write reads to disk immediately instead of caching them in RAM #210

Write reads to disk immediately instead of caching them in RAM #210

sebschmi commented Aug 3, 2021

Tj-Idowu left a comment •

edited

Write reads to disk immediately instead of caching them in RAM #210

Are you sure you want to change the base?

Write reads to disk immediately instead of caching them in RAM #210

Conversation

sebschmi commented Aug 3, 2021

Tj-Idowu left a comment • edited

Choose a reason for hiding this comment

Tj-Idowu left a comment •

edited