-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault core dumped on the derep step with Kinnex full length reads #1959
Comments
From some more educated guessing I suspect it has to do with not enough memory (I was only using 30 cores to do this which may not be enough). I am trying a smaller dataset with more cores at the moment. |
Your issue is probably due to memory. Long-read datasets require more memory per read than short-read datasets. The current dada2 recommended workflow (see dada2 tutorial) does not load all samples into memory at the same time as you are doing in the |
Thank you for your response! I was using your dada2 tutorial for PacBio Kinnex reads (https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html). What is the advantage of using this over the standard Illumina dada2 pipeline? |
Hi, just want to follow up on this. Should I be using the current PacBio workflow that you linked, or the PacBio Kinnex tutorial for fecal reads? |
The current dada2 tutorial is the best place to start. The reproducible analyses associated with the initial DADA2+PacBio manuscript that you linked above is also very useful. The key difference relevant to your analyses is that current dada2 does not recommend using the |
That makes sense, thank you! I tried this and it seems to be working so far! |
Hi Ben,
I am using PacBio Kinnex full length 16S reads and am having a hard time using the dada2 pipeline with them. I ran this code successfully as a test code with two samples and it ran and outputted data successfully, however, now that I am doing it on 150 samples I am getting errors I have never seen. The code was continually erroring out, either on memory or segmentation fault. on one particular sample. However when I removed that sample from the dataset I am still getting fatal seg faults on a different sample. Have you seen this before? Do you have any suggestions? I attached a screenshot of the error and the code I am using.
Thanks!
.libPaths(c("/home/sierrab4/Rlibs", .libPaths()))
.libPaths()
#load packages
library(dada2);packageVersion("dada2")
library(Biostrings); packageVersion("Biostrings")
library(ShortRead); packageVersion("ShortRead")
library(ggplot2); packageVersion("ggplot2")
library(reshape2); packageVersion("reshape2")
library(gridExtra); packageVersion("gridExtra")
library(phyloseq); packageVersion("phyloseq")
library(Rcpp); packageVersion("Rcpp")
path <- "/projects/sib/labs/kheath/kinnex_2024/first_150"
list.files(path)
path.out <- "Figures/"
path.rds <- "Rdata_and_RDS/"
fnseqs <- list.files(path, pattern="fastq.gz", full.names=TRUE)
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
rc <- dada2:::rc
theme_set(theme_bw())
#coding out all of this 5/17 to try and get it to dereplicate faster without running into errors
nops <- file.path(path, "noprimers", basename(fnseqs))
#prim <- removePrimers(fnseqs, nops, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)
#note: there is another way to remove primers that Chris Fields sent from someone else
#I am just using Ben Callahan's version
#Inspect length distribution.
#pdf("histogram.pdf")
#lens.fn <- lapply(nops, function(fn) nchar(getSequences(fn)))
#lens <- do.call(c, lens.fn)
#hist(lens, 100)
#dev.off()
#Look for peaks around 1450, this is the length of the 16S sequence
#Filter
filts <- file.path(path, "noprimers", "filtered", basename(fnseqs))
track <- filterAndTrim(nops, filts, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2)
track
#run DADA2
#Dereplicate
drp <- derepFastq(filts, verbose=TRUE)
`
The text was updated successfully, but these errors were encountered: