Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving the "Error: BiocParallel errors" (in almost every sample) #1946

Closed
Yoonseopark02 opened this issue May 3, 2024 · 5 comments
Closed

Comments

@Yoonseopark02
Copy link

Hi, I am an undergraduate studying bioinformatics in the microbiome analysis field, and I am having a BiocParallel Error every time I run my code for the ENA database datasets.
The same code worked well for NCBI datasets, and I kept on debugging but couldn't solve the problem.

Following is the code I used:

library(dada2)
library(ggplot2)
library(dplyr)

path <- "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371"
list.files(path)

#Assuming your forward and reverse fastq filenames have the format: SRRXXXXXXX_1.fastq and SRRXXXXXXX_2.fastq
fnFs <- sort(list.files(path, pattern="_1\.fastq.gz$", full.names = TRUE))
fnRs <- sort(list.files(path, pattern="2\.fastq.gz$", full.names = TRUE))
#Extract sample names, assuming filenames have the format: SRRXXXXXXX_X.fastq
sample.names <- sapply(strsplit(basename(fnFs), "
"), [, 1)

#INSPECT READ QUALITY PROFILES
QPF <- plotQualityProfile(fnFs[1:2])

Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in data.frame(sequence = names(freqtbl$top), count = as.integer(freqtbl$top), : arguments imply differing number of rows: 0, 1

I am getting this same error in almost every single public data I used..
Please give me comments how to solve this error.
Thanks so much in advance!

@benjjneb
Copy link
Owner

benjjneb commented May 3, 2024

What is head(fnFs)?

What is the output of head(ShortRead::readFastq(fnFs[[1]]))?

@Yoonseopark02
Copy link
Author

Thanks so much for the reply @benjjneb !!
It appears like this:

head(fnFs)
[1] "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371/SRR18030333_1.fastq.gz"
[2] "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371/SRR18030334_1.fastq.gz"
[3] "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371/SRR18030335_1.fastq.gz"
[4] "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371/SRR18030336_1.fastq.gz"
[5] "/Users/User/Documents/SeniorThesis/zebraENA/reads_fastq/PRJNA806371/SRR18030337_1.fastq.gz"
head(ShortRead::readFastq(fnFs[[1]]))
class: ShortReadQ
length: 6 reads; width: 151 cycles

Screenshot 2024-05-04 at 2 53 09 PM

@benjjneb
Copy link
Owner

benjjneb commented May 7, 2024

Could there be any input files in your data that are empty (i.e. contain no sequences)? See this comment thread: #1503 (comment)

If there are, removing those before running plotQualityProfile should solve the issue.

@Yoonseopark02
Copy link
Author

@benjjneb Thanks! I found that the input files had the error and solved it.

Sorry but can I ask one more thing, please?

I am having these errors in one large dataset when running the code
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(150,150), trimLeft = c(17, 21),
maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE,
compress=TRUE, multithread=TRUE)

Error in filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(150, 150), :
These are the errors (up to 5) encountered in individual cores...
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Mismatched forward and reverse sequence files: 0, 100000.
...

This error with multiple lines (please see the picture attached)
I often encounter this error too, and I tried to refer to #283 and ran this but couldn't find the problem.
Screenshot 2024-05-14 at 10 27 00 AM
Screenshot 2024-05-14 at 11 01 41 AM

Can you give me some advice on how to solve this?
Thank you so much!

@benjjneb
Copy link
Owner

The error messages indicate that for pairs of forward/reverse fastq files, one of the files has many reads (100k or 83.5k) while the other has zero reads. To troubleshoot, I would check that this error is caused by a single sample (i.e. fnFs[[1]] etc.), and then look at those individual files. Is one of them empty? Then it becomes a question of how these files were obtained. Did some pre-processing on your end lead to one of them being empty? Or did this discrepancy exist in the raw files you downloaded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants