Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

janrehker · 2023-09-29T13:19:52Z

Similar to #5 when aligning my dataset with bwa 0.7.17-r1188 and compare it with bwa mem2 (avx2 version), I get slightly different results.

(base) [patho-pipe@l-patho-ngs bwa-mem2_orig]$ cat diff_sorted.txt | grep 'NB500890:382:HHTNKAFX5:4:21405:25751:6735'
< NB500890:382:HHTNKAFX5:4:21405:25751:6735     99      chr9    115406079       60      4S96M   =       115406380       397     TATGTTAGACTGAAAGCTGCATCCGTCTTGTCAACTGCTGCATCCTCAGAACTTAAACATTCAGCTTATTCTGAAGTCTGCTAGTTCGTAGGGCAGGAA  AAA/AE6EEEEE6E6EE/E//EEA///////EA//E//EE/E//E/6EEA///EEEE<E/E<AE<//<EEAEEEE//6AAEEE//EA/EEEA<</E/<A<    NM:i:3  MD:Z:15G27G3T48 MC:Z:96M4S      AS:i:81 XS:i:21
< NB500890:382:HHTNKAFX5:4:21405:25751:6735     147     chr9    115406380       48      96M4S   =       115406079       -397    ATTAATGTAGATGGTTTGTCCTCAGTCTTGTATGTTAGGTTCTCTAACTATGGATGTTCATTTTAAGCAGTAGGCTTCTCTCAGTGATGTTGAGAAGGG  6/<A//////6/A///<EA/A/E/E//<////E<E/E/E/A////EE<///<E<E<///AE//EE//E/A//////E/AE/<///////6AEEEAA//AA    NM:i:11 MD:Z:1A9G16A1A4A8A3A1A19A4A13A6 MC:Z:4S96M      AS:i:44 XS:i:20
> NB500890:382:HHTNKAFX5:4:21405:25751:6735     97      chr9    115406079       60      4S96M   =       115406380       397     TATGTTAGACTGAAAGCTGCATCCGTCTTGTCAACTGCTGCATCCTCAGAACTTAAACATTCAGCTTATTCTGAAGTCTGCTAGTTCGTAGGGCAGGAA  AAA/AE6EEEEE6E6EE/E//EEA///////EA//E//EE/E//E/6EEA///EEEE<E/E<AE<//<EEAEEEE//6AAEEE//EA/EEEA<</E/<A<    NM:i:3  MD:Z:15G27G3T48 MC:Z:96M4S      AS:i:81 XS:i:21
> NB500890:382:HHTNKAFX5:4:21405:25751:6735     145     chr9    115406380       39      96M4S   =       115406079       -397    ATTAATGTAGATGGTTTGTCCTCAGTCTTGTATGTTAGGTTCTCTAACTATGGATGTTCATTTTAAGCAGTAGGCTTCTCTCAGTGATGTTGAGAAGGG  6/<A//////6/A///<EA/A/E/E//<////E<E/E/E/A////EE<///<E<E<///AE//EE//E/A//////E/AE/<///////6AEEEAA//AA    NM:i:11 MD:Z:1A9G16A1A4A8A3A1A19A4A13A6 MC:Z:4S96M      AS:i:44 XS:i:20

While I get the exact same alignment, I get a mapping quality score of 48 for bwa mem2 instead of 39 for bwa mem in this example. In other reads the switch happens in the opposite direction.

In another example...

(base) [patho-pipe@l-patho-ngs bwa-mem2_orig]$ cat diff_sorted.txt | grep 'NB500890:382:HHTNKAFX5:4:21612:26812:12621'
< NB500890:382:HHTNKAFX5:4:21612:26812:12621    83      chr8    93999844        60      97M3S   =       93999543        -398    CATACATCAATGAAAGAAAAAACGCCCTCTCCTTCCAGCTTAAGTTGGGGTGTTGTCATGATAAGACAAATACATTTGTGAAGTGTAGGGCTATAATGTG ////E6///</</////6<//E//EE//EAEE/EEAEEAA<<EA//EE/E/E///AEEEEEE/E/EEE/EEEEEAEEE//EEEEEEE/EEEEEEEAAAAA    NM:i:7  MD:Z:0A6A6T7T3T26G14C28 MC:Z:6S44M50S   AS:i:66 XS:i:20
< NB500890:382:HHTNKAFX5:4:21612:26812:12621    163     chr8    93999543        60      6S44M50S        =       93999844        398     GTTGTCGAATGCACTTGAAGGAACCAACTCTGGACACNGAATGAGAGGGTTTCCCAGGGGTGAGTGTCGTAGCACTCGGGAGGGNNNNNNNNNNNNNNNN A/////AA///<AEE//E/EEE////EE///E/////#E/6A//EEA/EA/E/////EAE/E/A/A//AE/E/////////A<E################    NM:i:2  MD:Z:9G21A12    MC:Z:97M3S      AS:i:37 XS:i:22
> NB500890:382:HHTNKAFX5:4:21612:26812:12621    83      chr8    93999844        60      97M3S   =       93999546        -395    CATACATCAATGAAAGAAAAAACGCCCTCTCCTTCCAGCTTAAGTTGGGGTGTTGTCATGATAAGACAAATACATTTGTGAAGTGTAGGGCTATAATGTG ////E6///</</////6<//E//EE//EAEE/EEAEEAA<<EA//EE/E/E///AEEEEEE/E/EEE/EEEEEAEEE//EEEEEEE/EEEEEEEAAAAA    NM:i:7  MD:Z:0A6A6T7T3T26G14C28 MC:Z:9S41M50S   AS:i:66 XS:i:20
> NB500890:382:HHTNKAFX5:4:21612:26812:12621    163     chr8    93999546        60      9S41M50S        =       93999844        395     GTTGTCGAATGCACTTGAAGGAACCAACTCTGGACACNGAATGAGAGGGTTTCCCAGGGGTGAGTGTCGTAGCACTCGGGAGGGNNNNNNNNNNNNNNNN A/////AA///<AEE//E/EEE////EE///E/////#E/6A//EEA/EA/E/////EAE/E/A/A//AE/E/////////A<E################    NM:i:2  MD:Z:6G21A12    MC:Z:97M3S      AS:i:34 XS:i:22

I get different CIGARS of 9S41M50S (bwa mem2) and 6S44M50S (bwa mem). In my opinion they are both wrong and at least the start should be 5SxxMyyS. Nevertheless I still see a difference between both programs.

Alltogether I do not extract many different alignments / mapping qualities:
11562 out of ~27 million reads or ~0.04%.

However, all of those changes seem to be caused by a difference between the two alignment tools. When I run bwa mem twice on my dataset, I do not get any difference between both aligners, so undercomplex regions with probabilistic mapping do not seem to be the issue here.

While bwa mem2 appeared to be ~2x as fast in my tests and results appear mostly comparable to bwa mem, I cannot confirm the claim, that it is producing alignment identical to bwa at the moment.

Best regards,
Jan Rehker

The text was updated successfully, but these errors were encountered:

fredjarlier · 2023-10-02T08:59:55Z

Hi @janrehker what is the environment and commands lines? Does it happen with only 1 thread? thanks

janrehker · 2023-10-06T11:02:04Z

Hi fredjarlier,

Thank you for your reply!

CentOS8,
./bwa-mem2.avx2 version
2.2.1

Reducing the number of threads from 20 to 1, I now get zero difference:

/mnt/NFS/s-msp01/pathodata/apps/bwa-0.7.17/bin/bwa mem -t 1 /mnt/NFS/s-msp01/pathodata/DB/GRCh38.p13/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R1_001.fastq.gz 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R2_001.fastq.gz > test_bwa.sam

./bwa-mem2.avx2 mem -t 1 GRCh38_full_plus_hs38d1/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fa 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R1_001.fastq.gz 270190-RunA048-2864-2023-IGVhg38-F11-66_S2_R2_001.fastq.gz > test_bwa-mem2-avx2.sam

samtools sort -n -m 30G -@1 -o test_bwa-mem2-avx2_sorted.sam -O SAM test_bwa-mem2-avx2.sam; samtools sort -n -m 30G -@1 -o test_bwa_sorted.sam -O SAM test_bwa.sam; diff test_bwa-mem2-avx2_sorted.sam test_bwa_sorted.sam > diff_unsorted1thread.txt

(... with diff just spitting out the different used command lines)

I wonder why this happens, as I would not expect to see a difference between multithreaded alignment and single threaded ones in regard to the alignment itself, just by the way the reads are sorted, the latter being corrected by sorting by readname.

Do you have some hint for me with regard to that issue?

Best regards,
Jan

teepean · 2023-10-06T14:33:31Z

With bwa you can use option -K to get fixed chunk sizes and reproducible results.

"Use -K 100000000 to achieve deterministic alignment results (Note: this is a hidden option)"

lh3/bwa#121

fredjarlier · 2023-10-06T14:57:28Z

thks for the info

The default chunk size depends on the number of the threads so results may vary according to degree of parallelization...

But in the first test you had the same number threads for bwa2 and bwa1, right?

if so default chunks have the same size and results shoud be identical.

I think It worth trying with the -K option to rule out parallisation effect.

Otherwise it could be a matter of thread safety, what is the compiler and CPU?

best

fredjarlier · 2023-10-06T15:15:35Z

or an alignment array issue with AVX2

quito418 mentioned this issue Jan 31, 2024

Why are bwa and BWA-MEME results inconsistent? kaist-ina/BWA-MEME#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

janrehker commented Sep 29, 2023

fredjarlier commented Oct 2, 2023

janrehker commented Oct 6, 2023 •

edited

teepean commented Oct 6, 2023

fredjarlier commented Oct 6, 2023 •

edited

fredjarlier commented Oct 6, 2023

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem #246

Comments

janrehker commented Sep 29, 2023

fredjarlier commented Oct 2, 2023

janrehker commented Oct 6, 2023 • edited

teepean commented Oct 6, 2023

fredjarlier commented Oct 6, 2023 • edited

fredjarlier commented Oct 6, 2023

janrehker commented Oct 6, 2023 •

edited

fredjarlier commented Oct 6, 2023 •

edited