Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential duplicate generation of RG tag on inputs with RG information #1110

Open
SHuang-Broad opened this issue Aug 30, 2023 · 0 comments
Open

Comments

@SHuang-Broad
Copy link

Hi Heng,

This isn't necessarily a bug, but I was a bit surprised.
Also, this definitely is not a high-impact issue.

So, this is arguably an edge case.
When one has an input where each read has its associated RG (readgroup) information, that could be duplicated.

Here's an example.

Say the input is an unaligned BAM that has the RG tag for all its reads (with other tags like 5mC calls), one would run the command like the following

samtools fastq -t -T MM,ML <input_ubam> \
| minimap -ayYL -x <preset> -R "@RG\ID:matching_readgroup_id..." <ref> - \
| samtools sort -o output.bam

This will create two RG tags for each read.
Of course, this can be averted without the -t flag in samtools fastq.
But the documentation of samtools fastq says it'll copy not only RG, but also BC and QT tags, so one could still want to keep that flag.
Alternatively, one can skip specifying the readgroup info for minimap2, and later add that by samtools reheader but this is extra work.

So, a convenient feature would be for minimap2 to check if the "comments" that would be copied from the input FASTQ come with RG. And if so, don't write that again based on the information provided via -R "@RG\ID:matching_readgroup_id...".

Thanks,
Steve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant