Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: max count before zero is less than min required count (4) duplicates removed #71

Open
colin893 opened this issue Mar 27, 2024 · 3 comments

Comments

@colin893
Copy link

colin893 commented Mar 27, 2024

Hi,

Even if I found multiple times the error, I can't figure out what's wrong. I got this bam mapped with RNASeq pipeline from Nextflow, and I want to get curve (from SMARTSeq in fact). However, I have this error every time whatever I try.

$preseq lc_extrap -v -o out.nomerge.curve Exp16-12x-FS10-1-A02.sorted.bam

BED_INPUT
TOTAL READS     = 137084
DISTINCT READS  = 137083
DISTINCT COUNTS = 2
MAX COUNT       = 2
COUNTS OF 1     = 137082
MAX TERMS       = 2
OBSERVED COUNTS (3)
1	137082
2	1

ERROR:	max count before zero is less than min required count (4) duplicates removed
$preseq lc_extrap -v -o out.nomerge.curve -P -B Exp16-12x-FS10-1-A02.sorted.bam

PAIRED_END_BED_INPUT
ERROR:	problem opening file: -B
root@zddm2-NV-PC:/home/zddm2/Documents# preseq lc_extrap -v -o out.nomerge.curve -B Exp16-12x-FS10-1-A02.sorted.bam
BED_INPUT
ERROR:	problem opening file: -B

Here are my two lasts attempts, it seems there is a problem with the -B parameter? I tried with or without, I have one of the two error message.
I installed preseq with ./configure --enable-hts just to say,

would you have an idea? Thanks for your time!

@andrewdavidsmith
Copy link
Contributor

The first error tells you that your data exhibits near perfect diversity, and in practice this usually means that the data had been de-duplicated or that your data, in practice, will never reach saturation. Possibly the experiment has somehow ensured nothing is sampled twice, other than one read. This isn't necessarily an error, it just tells you we can't learn about sample complexity from your data set.

The second error is, I think, because a file name is expected after the -P argument.

@bounlu
Copy link

bounlu commented Apr 3, 2024

If the first error isn't necessarily an error, can we implement a graceful failure without an error in those cases, possible a warning, as it causes the break down of an entire pipeline reported here?

@andrewdavidsmith
Copy link
Contributor

I'll consider any pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants