Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samtools + libdeflate out performs sambamba on a single thread #485

Open
mebbert opened this issue Feb 26, 2022 · 3 comments
Open

Samtools + libdeflate out performs sambamba on a single thread #485

mebbert opened this issue Feb 26, 2022 · 3 comments

Comments

@mebbert
Copy link

mebbert commented Feb 26, 2022

Hello,
I recently heard about sambamba and it's performance gains over samtools, and was excited to compare it to samtools + zlib and samtools + libdeflate (I had also heard that libdeflate really improves samtools performance).

I compared all three configurations and you can see my full post here: Samtools sort: most efficient memory and thread settings for many samples on a cluster

In short, I compare overall performance (measured by time) at different CPU and memory options. I was impressed that sambamba outperforms the other two in pretty much every configuration. There were two things I wanted to share directly that may be of interest:

  1. Using only one thread, samtools + libdeflate out performs sambamba which suggests sambamba could be optimized even more at the compression steps (Fig. 1). You can compare sambamba (red) and samtools + libdeflate (purple) at 1 CPU on the far left of Fig. 1. I'm not sure what sambamba uses for compression, though. I'm guessing it doesn't use libdeflate, otherwise I suspect it would have suffered from the same poor CPU utilization that samtools + libdeflate suffered from with additional threads. If sambamba is using zlib, however, I suspect you could really push the limits for manipulating .bam files.
  2. I also wanted to see how well each tool could utilize the CPUs allotted to it. sambamba does the best at utilizing allotted CPUs, but it also eventually flattens out. This is obviously a classic computer science problem, but thought you might like to see where sambamba flattens out. TBH, I doubt there's much incentive to optimize CPU usage any higher than 9 CPUs, anyway, but who knows? samtools + libdeflate flattens out very quickly and is unable to fully utilize allotted CPUs as well as the other two configurations (Fig. 2). I assume this boils down to libdeflate, but maybe it's more complicated than that. I reported this on the libdeflate GitHub page so they can look into it.

And thank you for your work. We need more efficient tools like sambamba!

Figure 1: Realtime vs CPU and Mem Per Thread for samtools + zlib, samtools + libdeflate (Lsamtools), and sambamba
Realtime vs CPU and Mem Per Thread

Figure 2: Requested CPUs vs. CPU utilization for samtools + zlib, samtools + libdeflate (Lsamtools), and sambamba
CPUs vs. CPU utilization

@mebbert mebbert changed the title Samtools + libdeflate out performs sambamba on a single thread Samtools + libdeflate out performs sambamba on a single thread Feb 26, 2022
@pjotrp
Copy link
Member

pjotrp commented Mar 1, 2022

Thanks. It is worth trying and should not be hard to test with guix.

@mebbert
Copy link
Author

mebbert commented Mar 1, 2022

I heard back on my post to libdeflate. I don't know much about different compression methods, but here are key takeaways I had:

  1. The author of libdeflate doesn't think there's any reason libdeflate itself would limit CPU usage.
  2. He also said the "LZ4 compression format results in faster compression and decompression, but a worse compression ratio than DEFLATE." So, sounds like libdeflate won't be any faster. Maybe there's some other explanation for why samtools + libdeflate performed better with a single thread. May have been something I did (e.g., 10's of different threads reading from the same files?). MIght be worth looking over my code and testing it out yourself to verify.

Anyway, just wanted to report what I found in case it could be useful.

@mebbert
Copy link
Author

mebbert commented Mar 1, 2022

Oh, meant to include a link to the post at libdeflate: ebiggers/libdeflate#170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants