Write reads to disk immediately instead of caching them in RAM #210
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First of all, thank you for making ISS. I find it very fast and easy to use, especially because it ships with error models.
When trying it out I noted that it uses a lot of RAM, which seemed odd for a read simulator, especially since it slowly eats RAM over time. However I think I found the reason and a fix for that.
When generating reads, ISS first stores all reads in a python list in RAM. Only after generating all reads, it writes them to disk.
However, it would be much more memory efficient to write them to disk immediately after generation. So this is what I did. I moved the read generation code into a generator function
reads_generator
which I pass toto_fastq
.As a result, the memory usage is now small and stays constant during generation.