You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(For others who come across this: this is an issue with an edge use case of Nonpareil; I’m otherwise very happy with the program and trust it for higher coverage samples).
I designed an experiment to see how the output of Nonpareil changes when a FASTQ is repeatedly halved in size. The behavior above a redundancy value of 20% is that the subsampled FASTQ files follow the Nonpareil curve of the larger FASTQ file. Under 20%, however, the data show a systematic bias towards low redundancy (an example of the data is shown within the affected range in the below plot). The bias affects estimates of diversity and how much additional sequencing effort is needed. I suspect that the issue may be, using the language in the original paper, in the assumptions behind how the total number of reads affects the probability of observing matches between reads. At low total number of reads, it becomes less and less likely to find matches between reads; is the binomial distribution still appropriate in such a context?
The text was updated successfully, but these errors were encountered:
(For others who come across this: this is an issue with an edge use case of Nonpareil; I’m otherwise very happy with the program and trust it for higher coverage samples).
I designed an experiment to see how the output of Nonpareil changes when a FASTQ is repeatedly halved in size. The behavior above a redundancy value of 20% is that the subsampled FASTQ files follow the Nonpareil curve of the larger FASTQ file. Under 20%, however, the data show a systematic bias towards low redundancy (an example of the data is shown within the affected range in the below plot). The bias affects estimates of diversity and how much additional sequencing effort is needed. I suspect that the issue may be, using the language in the original paper, in the assumptions behind how the total number of reads affects the probability of observing matches between reads. At low total number of reads, it becomes less and less likely to find matches between reads; is the binomial distribution still appropriate in such a context?
The text was updated successfully, but these errors were encountered: