Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic bias at low coverage (under 20%) #45

Open
tylerbarnum opened this issue Aug 28, 2020 · 0 comments
Open

Systematic bias at low coverage (under 20%) #45

tylerbarnum opened this issue Aug 28, 2020 · 0 comments
Assignees
Labels

Comments

@tylerbarnum
Copy link

(For others who come across this: this is an issue with an edge use case of Nonpareil; I’m otherwise very happy with the program and trust it for higher coverage samples).

I designed an experiment to see how the output of Nonpareil changes when a FASTQ is repeatedly halved in size. The behavior above a redundancy value of 20% is that the subsampled FASTQ files follow the Nonpareil curve of the larger FASTQ file. Under 20%, however, the data show a systematic bias towards low redundancy (an example of the data is shown within the affected range in the below plot). The bias affects estimates of diversity and how much additional sequencing effort is needed. I suspect that the issue may be, using the language in the original paper, in the assumptions behind how the total number of reads affects the probability of observing matches between reads. At low total number of reads, it becomes less and less likely to find matches between reads; is the binomial distribution still appropriate in such a context?

image

@lmrodriguezr lmrodriguezr self-assigned this Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants