Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappy segfault with long sequence and MD=True #1183

Open
davidnewman02 opened this issue Mar 25, 2024 · 2 comments
Open

Mappy segfault with long sequence and MD=True #1183

davidnewman02 opened this issue Mar 25, 2024 · 2 comments
Labels

Comments

@davidnewman02
Copy link

davidnewman02 commented Mar 25, 2024

I have a very long read (~500kb) which I try to align using mappy in python using the MD=True flag and this gives a Segfault. This read does not align well, with a large amount of soft-clipping.

This does not appear to happen with the minimap2 standalone executable or when MD=False

read.fastq.txt
NB The full reference file is too large for a gitlab issue, but is T2T-v1.0 with a small amount of local cleanup: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.fasta

from pathlib import Path

import mappy
print(f"Mappy version: {mappy.__version__}")

from mappy import Aligner, ThreadBuffer

ref = "normal_chr.v1.0.fasta.gz"
print("Loading reference")
aligner = Aligner(ref, preset='map-ont')

print("Loading sequence")
fastq = Path("read.fastq.txt").read_text()
lines = fastq.split('\n')
read_id = lines[0][1:-1]
seq = lines[1][1:-1]

print("Running alignment")
res = next(aligner.map(seq, MD=True))
print(res)

output:

Mappy version: 2.27
Loading reference
Loading sequence
Running alignment
Segmentation fault (core dumped)

The equivalent with standalone minimap2 runs successfully:

$ minimap2 normal_chr.v1.0.fasta.gz read.fastq.txt -ax map-ont -t 40 --MD > out.sam

[M::mm_idx_gen::113.331*1.92] collected minimizers
[M::mm_idx_gen::120.691*3.24] sorted minimizers
[M::main::120.691*3.24] loaded/built the index for 48 target sequence(s)
[M::mm_mapopt_update::122.395*3.21] mid_occ = 1671
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 48
[M::mm_idx_stat::123.104*3.19] distinct minimizers: 101142495 (3.49% are singletons); average occurrences: 11.113; average spacing: 5.338; total length: 5999703092
[M::worker_pipeline::178.461*2.51] mapped 1 sequences
[M::main] Version: 2.27-r1207-dirty
[M::main] CMD: ./minimap2 -ax map-ont -t 40 --MD normal_chr.v1.0.fasta.gz read.fastq.txt
[M::main] Real time: 178.982 sec; CPU: 449.128 sec; Peak RSS: 22.010 GB
@lh3
Copy link
Owner

lh3 commented Mar 25, 2024

This is probably caused by the same bug #1177 and/or #1181. I will cut a new release later this week.

@lh3 lh3 added the bug label Mar 25, 2024
@davidnewman02
Copy link
Author

davidnewman02 commented Mar 27, 2024

I am able to reproduce this bug in earlier mappy versions (tested back to 2.18), so I think this might be a different root cause as those issues appear to be due to more recent code additions.

EDIT: I have just downloaded the latest mappy=2.28 and can confirm the Segfault is still present for this read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants