Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misleading output in summary.txt file #198

Open
nservant opened this issue Mar 24, 2023 · 3 comments
Open

misleading output in summary.txt file #198

nservant opened this issue Mar 24, 2023 · 3 comments

Comments

@nservant
Copy link

nservant commented Mar 24, 2023

Hi,

We had an issue which might be interesting for you (and other users).
Whe have a WES sample, and after a bug on a storage system, we only get reads on the chr1 after doing an intersectBed of the bam with the bed target file. Nothing related to mostdepth so far.

However, when we compute the coverage of this bam file with ;

mosdepth -t 4 -n --quantize 0:1:10:50:100: --by Twist_Exome_Core_Covered_Targets_hg38.bed PREFIX onTarget.bam

It reports a very high total mean coverage (181.34X) in the summary file (grep total_region)

chrom	length	bases	mean	min	max
chr1	248956422	937505185	3.77	0	2887
chr1_region	3358574	609028999	181.34	0	2887
total	248956422	937505185	3.77	0	2887
total_region	3358574	609028999	181.34	0	2887

So to me this is misleading, as the total_region does not seem to take into account the full bed, but only the non-zero chromosomes.

Of note, this is confirmed using the regions.bed.gz file but the 'real' mean coverage should be 18.36X.

>>zcat regions.bed.gz | awk '$1=="chr1"{ z=($3-$2); s=s+z*$4; c=c+z}END{print s/c}'
181.336
>>zcat test.regions.bed.gz | awk '{ z=($3-$2); s=s+z*$4; c=c+z}END{print s/c}'
18.3644
@nservant
Copy link
Author

the issue is also true for the file mosdepth.region.dist.txt

@brentp
Copy link
Owner

brentp commented Mar 24, 2023

Thanks for the clear report. I see what you mean. Due to the way I wrote that part, it's not an easy fix. I'll have a closer look next week but it might be that this will remain a (now) known issue.

@nmflack
Copy link

nmflack commented Jan 22, 2024

Thank you @nservant for documenting this issue, I've run into the same unfortunately.

Any progress on a fix? If not, can you note this behavior more prominentely in the README as a known issue? It's not a "known" issue otherwise.

I'm using mosdepth to evaluate regional enrichment from Oxford Nanopore adaptive sampling. The difference between total_region and manual calculation from region.bed.gz changed our wet bench plans for a re-run of valuable samples. I sent this issue to colleagues who have been relying on tailing summary.txt as well.

We're running version 0.3.6 from bioconda. Thank you, really appreciate the speed and ease of use of the package for this application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants