Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

Open
janrehker opened this issue Sep 29, 2023 · 5 comments

Comments

@janrehker
Copy link

Similar to #5 when aligning my dataset with bwa 0.7.17-r1188 and compare it with bwa mem2 (avx2 version), I get slightly different results.

(base) [patho-pipe@l-patho-ngs bwa-mem2_orig]$ cat diff_sorted.txt | grep 'NB500890:382:HHTNKAFX5:4:21405:25751:6735'
< NB500890:382:HHTNKAFX5:4:21405:25751:6735     99      chr9    115406079       60      4S96M   =       115406380       397     TATGTTAGACTGAAAGCTGCATCCGTCTTGTCAACTGCTGCATCCTCAGAACTTAAACATTCAGCTTATTCTGAAGTCTGCTAGTTCGTAGGGCAGGAA  AAA/AE6EEEEE6E6EE/E//EEA///////EA//E//EE/E//E/6EEA///EEEE<E/E<AE<//<EEAEEEE//6AAEEE//EA/EEEA<</E/<A<    NM:i:3  MD:Z:15G27G3T48 MC:Z:96M4S      AS:i:81 XS:i:21
< NB500890:382:HHTNKAFX5:4:21405:25751:6735     147     chr9    115406380       48      96M4S   =       115406079       -397    ATTAATGTAGATGGTTTGTCCTCAGTCTTGTATGTTAGGTTCTCTAACTATGGATGTTCATTTTAAGCAGTAGGCTTCTCTCAGTGATGTTGAGAAGGG  6/<A//////6/A///<EA/A/E/E//<////E<E/E/E/A////EE<///<E<E<///AE//EE//E/A//////E/AE/<///////6AEEEAA//AA    NM:i:11 MD:Z:1A9G16A1A4A8A3A1A19A4A13A6 MC:Z:4S96M      AS:i:44 XS:i:20
> NB500890:382:HHTNKAFX5:4:21405:25751:6735     97      chr9    115406079       60      4S96M   =       115406380       397     TATGTTAGACTGAAAGCTGCATCCGTCTTGTCAACTGCTGCATCCTCAGAACTTAAACATTCAGCTTATTCTGAAGTCTGCTAGTTCGTAGGGCAGGAA  AAA/AE6EEEEE6E6EE/E//EEA///////EA//E//EE/E//E/6EEA///EEEE<E/E<AE<//<EEAEEEE//6AAEEE//EA/EEEA<</E/<A<    NM:i:3  MD:Z:15G27G3T48 MC:Z:96M4S      AS:i:81 XS:i:21
> NB500890:382:HHTNKAFX5:4:21405:25751:6735     145     chr9    115406380       39      96M4S   =       115406079       -397    ATTAATGTAGATGGTTTGTCCTCAGTCTTGTATGTTAGGTTCTCTAACTATGGATGTTCATTTTAAGCAGTAGGCTTCTCTCAGTGATGTTGAGAAGGG  6/<A//////6/A///<EA/A/E/E//<////E<E/E/E/A////EE<///<E<E<///AE//EE//E/A//////E/AE/<///////6AEEEAA//AA    NM:i:11 MD:Z:1A9G16A1A4A8A3A1A19A4A13A6 MC:Z:4S96M      AS:i:44 XS:i:20

While I get the exact same alignment, I get a mapping quality score of 48 for bwa mem2 instead of 39 for bwa mem in this example. In other reads the switch happens in the opposite direction.

In another example...

(base) [patho-pipe@l-patho-ngs bwa-mem2_orig]$ cat diff_sorted.txt | grep 'NB500890:382:HHTNKAFX5:4:21612:26812:12621'
< NB500890:382:HHTNKAFX5:4:21612:26812:12621    83      chr8    93999844        60      97M3S   =       93999543        -398    CATACATCAATGAAAGAAAAAACGCCCTCTCCTTCCAGCTTAAGTTGGGGTGTTGTCATGATAAGACAAATACATTTGTGAAGTGTAGGGCTATAATGTG ////E6///</</////6<//E//EE//EAEE/EEAEEAA<<EA//EE/E/E///AEEEEEE/E/EEE/EEEEEAEEE//EEEEEEE/EEEEEEEAAAAA    NM:i:7  MD:Z:0A6A6T7T3T26G14C28 MC:Z:6S44M50S   AS:i:66 XS:i:20
< NB500890:382:HHTNKAFX5:4:21612:26812:12621    163     chr8    93999543        60      6S44M50S        =       93999844        398     GTTGTCGAATGCACTTGAAGGAACCAACTCTGGACACNGAATGAGAGGGTTTCCCAGGGGTGAGTGTCGTAGCACTCGGGAGGGNNNNNNNNNNNNNNNN A/////AA///<AEE//E/EEE////EE///E/////#E/6A//EEA/EA/E/////EAE/E/A/A//AE/E/////////A<E################    NM:i:2  MD:Z:9G21A12    MC:Z:97M3S      AS:i:37 XS:i:22
> NB500890:382:HHTNKAFX5:4:21612:26812:12621    83      chr8    93999844        60      97M3S   =       93999546        -395    CATACATCAATGAAAGAAAAAACGCCCTCTCCTTCCAGCTTAAGTTGGGGTGTTGTCATGATAAGACAAATACATTTGTGAAGTGTAGGGCTATAATGTG ////E6///</</////6<//E//EE//EAEE/EEAEEAA<<EA//EE/E/E///AEEEEEE/E/EEE/EEEEEAEEE//EEEEEEE/EEEEEEEAAAAA    NM:i:7  MD:Z:0A6A6T7T3T26G14C28 MC:Z:9S41M50S   AS:i:66 XS:i:20
> NB500890:382:HHTNKAFX5:4:21612:26812:12621    163     chr8    93999546        60      9S41M50S        =       93999844        395     GTTGTCGAATGCACTTGAAGGAACCAACTCTGGACACNGAATGAGAGGGTTTCCCAGGGGTGAGTGTCGTAGCACTCGGGAGGGNNNNNNNNNNNNNNNN A/////AA///<AEE//E/EEE////EE///E/////#E/6A//EEA/EA/E/////EAE/E/A/A//AE/E/////////A<E################    NM:i:2  MD:Z:6G21A12    MC:Z:97M3S      AS:i:34 XS:i:22

I get different CIGARS of 9S41M50S (bwa mem2) and 6S44M50S (bwa mem). In my opinion they are both wrong and at least the start should be 5SxxMyyS. Nevertheless I still see a difference between both programs.

Alltogether I do not extract many different alignments / mapping qualities:
11562 out of ~27 million reads or ~0.04%.

However, all of those changes seem to be caused by a difference between the two alignment tools. When I run bwa mem twice on my dataset, I do not get any difference between both aligners, so undercomplex regions with probabilistic mapping do not seem to be the issue here.

While bwa mem2 appeared to be ~2x as fast in my tests and results appear mostly comparable to bwa mem, I cannot confirm the claim, that it is producing alignment identical to bwa at the moment.

Best regards,
Jan Rehker

@fredjarlier
Copy link

Hi @janrehker what is the environment and commands lines? Does it happen with only 1 thread? thanks

@janrehker
Copy link
Author

janrehker commented Oct 6, 2023

Hi fredjarlier,

Thank you for your reply!

CentOS8,
./bwa-mem2.avx2 version
2.2.1

Reducing the number of threads from 20 to 1, I now get zero difference:

/mnt/NFS/s-msp01/pathodata/apps/bwa-0.7.17/bin/bwa mem -t 1 /mnt/NFS/s-msp01/pathodata/DB/GRCh38.p13/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R1_001.fastq.gz 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R2_001.fastq.gz > test_bwa.sam

./bwa-mem2.avx2 mem -t 1 GRCh38_full_plus_hs38d1/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fa 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R1_001.fastq.gz 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R2_001.fastq.gz > test_bwa-mem2-avx2.sam

samtools sort -n -m 30G -@1 -o test_bwa-mem2-avx2_sorted.sam -O SAM test_bwa-mem2-avx2.sam; samtools sort -n -m 30G -@1 -o test_bwa_sorted.sam -O SAM test_bwa.sam; diff test_bwa-mem2-avx2_sorted.sam test_bwa_sorted.sam > diff_unsorted1thread.txt

(... with diff just spitting out the different used command lines)

I wonder why this happens, as I would not expect to see a difference between multithreaded alignment and single threaded ones in regard to the alignment itself, just by the way the reads are sorted, the latter being corrected by sorting by readname.

Do you have some hint for me with regard to that issue?

Best regards,
Jan

@teepean
Copy link
Contributor

teepean commented Oct 6, 2023

With bwa you can use option -K to get fixed chunk sizes and reproducible results.

"Use -K 100000000 to achieve deterministic alignment results (Note: this is a hidden option)"

lh3/bwa#121

@fredjarlier
Copy link

fredjarlier commented Oct 6, 2023

thks for the info

The default chunk size depends on the number of the threads so results may vary according to degree of parallelization...

But in the first test you had the same number threads for bwa2 and bwa1, right?

if so default chunks have the same size and results shoud be identical.

I think It worth trying with the -K option to rule out parallisation effect.

Otherwise it could be a matter of thread safety, what is the compiler and CPU?

best

@fredjarlier
Copy link

or an alignment array issue with AVX2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants