Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of primary alignments with bwa-mem2 #228

Open
gh-jphan opened this issue Apr 13, 2023 · 13 comments
Open

Reproducibility of primary alignments with bwa-mem2 #228

gh-jphan opened this issue Apr 13, 2023 · 13 comments

Comments

@gh-jphan
Copy link
Contributor

gh-jphan commented Apr 13, 2023

I've encountered an issue with reproducibility of bwa-mem2 that is likely due to a multi-threading bug. This may be related to the following issue: #215.

Observations while documenting this issue:

  1. When running bwa-mem2 multiple times using the same input, there is a small but random probability that the alignment output is different. E.g., out of 1000 runs with the same dataset, most outputs are identical, but a small single digit percentage of outputs are different.
  2. These differences are always related to alignments that have multiple alternate alignment positions (i.e., alignments that produce XA tags). I understand that when there are multiple equally good alignment positions, bwa-mem2 will randomly choose a position. However, that "random" choice should be the same from run to run, given that the input data is identical (including read order).
  3. A smaller -K parameter (for batch size) increases the probability that a different output occurs. A -K large enough that loads the entire dataset results in 0 different outputs (fixes the problem).
  4. Disabling multi-threaded IO (-1 parameter) results in 0 different outputs (fixes the problem).
  5. AVX512 optimized binary has a higher probability of different outputs compared to SSE, AVX, AVX2, etc. This probably affects reproducibility because of the speed/timing changes when executing certain code blocks or steps.
  6. The original bwa mem does not have this issue, outputs are always identical.

I believe I have a simple fix, but wanted to document it as an issue. I will open a PR shortly.

Thanks!
-John

@robertzeibich
Copy link

Hi John, Do you think your finding could also fix my problem (#233)? Should I turn off multi-threading? I think my problem also aligns with what was posted here: #227. Any recommendation for me would be much appreciated.

@gh-jphan
Copy link
Contributor Author

Hi Robert, I'm not sure if it's a related problem, but it could be. My comparisons were within runs of bwa-mem2, and I didn't compare output between bwa-mem2 and bwa. You mentioned that there is an input difference for the fasta files. Ideally they should be the same if you're expecting the same output SAM/BAM files (unless I'm mis-understanding). Also, it's not clear to me what are the differences between the outputs. Maybe if you could post some example SAM outputs showing the differences.

@chappj1
Copy link

chappj1 commented Jun 2, 2023

@gh-jphan Your point 2 above:

"These differences are always related to alignments that have multiple alternate alignment positions (i.e., alignments that produce XA tags). I understand that when there are multiple equally good alignment positions, bwa-mem2 will randomly choose a position. However, that "random" choice should be the same from run to run, given that the input data is identical (including read order)."

How do you know "bwa-mem2 will randomly choose a position"? I do not see that information in the documentation anywhere. Also, is there any way to set this option to something different, e.g. create an alignment record for all positions, rather than randomly selecting one?

Thanks,
James

@gh-jphan
Copy link
Contributor Author

gh-jphan commented Jun 2, 2023

@chappj1, I think that question has been asked before and unfortunately, the docs don't clearly describe the behavior (at least in the docs I could find). It's supposed to mimic bwa behavior, so some of the older bwa docs describe behavior for samse, etc, but not for mem. For example: https://www.biostars.org/p/304614/. And in the reply to this post is a description of the exepected behavior that I've observed: https://davetang.org/muse/2011/10/11/bwa-and-multi-mapping-reads/.

There is an option "-a" to output all alignments, but there may be a very large number of alternative alignments for some reads.

@k1sauce
Copy link

k1sauce commented Dec 6, 2023

@yuk12 Is there any plan for #229, also is bwa-mem2 still being maintained?

@vasimuddin
Copy link

@k1sauce yes, it is being maintained. We are in the middle of fixing the issue, and will do a release this month.

@shanebrubaker
Copy link

HI is there any update on this? We would really like to use this fix and have it merged in. Thanks!

@gh-jphan
Copy link
Contributor Author

gh-jphan commented May 8, 2024

I second that, and just resolved a conflict since the PR has been open for a while.

@gh-jphan
Copy link
Contributor Author

gh-jphan commented May 8, 2024

I second that, and just resolved a conflict since the PR has been open for a while.

Unfortunately I do not have write access, I think it is up to: @yuk12

@yuk12
Copy link
Member

yuk12 commented May 8, 2024

merged. Sorry for the delay. Appreciate the fix.

@yuk12
Copy link
Member

yuk12 commented May 8, 2024

Will make a release after a few tests in a day or two.

@serge2016
Copy link

Thank you!!!

@shanebrubaker
Copy link

shanebrubaker commented May 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants