Segmentation fault core dumped on the derep step with Kinnex full length reads #1959

sbedwell27 · 2024-05-18T21:13:38Z

Hi Ben,

I am using PacBio Kinnex full length 16S reads and am having a hard time using the dada2 pipeline with them. I ran this code successfully as a test code with two samples and it ran and outputted data successfully, however, now that I am doing it on 150 samples I am getting errors I have never seen. The code was continually erroring out, either on memory or segmentation fault. on one particular sample. However when I removed that sample from the dataset I am still getting fatal seg faults on a different sample. Have you seen this before? Do you have any suggestions? I attached a screenshot of the error and the code I am using.

Thanks!

.libPaths(c("/home/sierrab4/Rlibs", .libPaths()))
.libPaths()

#load packages
library(dada2);packageVersion("dada2")
library(Biostrings); packageVersion("Biostrings")
library(ShortRead); packageVersion("ShortRead")
library(ggplot2); packageVersion("ggplot2")
library(reshape2); packageVersion("reshape2")
library(gridExtra); packageVersion("gridExtra")
library(phyloseq); packageVersion("phyloseq")
library(Rcpp); packageVersion("Rcpp")

path <- "/projects/sib/labs/kheath/kinnex_2024/first_150"
list.files(path)
path.out <- "Figures/"
path.rds <- "Rdata_and_RDS/"
fnseqs <- list.files(path, pattern="fastq.gz", full.names=TRUE)
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
rc <- dada2:::rc
theme_set(theme_bw())

#coding out all of this 5/17 to try and get it to dereplicate faster without running into errors

nops <- file.path(path, "noprimers", basename(fnseqs))
#prim <- removePrimers(fnseqs, nops, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)

#note: there is another way to remove primers that Chris Fields sent from someone else
#I am just using Ben Callahan's version

#Inspect length distribution.

#pdf("histogram.pdf")

#lens.fn <- lapply(nops, function(fn) nchar(getSequences(fn)))
#lens <- do.call(c, lens.fn)
#hist(lens, 100)

#dev.off()

#Look for peaks around 1450, this is the length of the 16S sequence

#Filter

filts <- file.path(path, "noprimers", "filtered", basename(fnseqs))
track <- filterAndTrim(nops, filts, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2)
track

#run DADA2

#Dereplicate

drp <- derepFastq(filts, verbose=TRUE)
`

sbedwell27 · 2024-05-20T20:00:26Z

From some more educated guessing I suspect it has to do with not enough memory (I was only using 30 cores to do this which may not be enough). I am trying a smaller dataset with more cores at the moment.

benjjneb · 2024-05-21T01:45:15Z

Your issue is probably due to memory. Long-read datasets require more memory per read than short-read datasets.

The current dada2 recommended workflow (see dada2 tutorial) does not load all samples into memory at the same time as you are doing in the derepFastq command. If you follow this workflow, your maximum memory requirements will be much lower.

sbedwell27 · 2024-05-21T01:48:36Z

Thank you for your response! I was using your dada2 tutorial for PacBio Kinnex reads (https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html). What is the advantage of using this over the standard Illumina dada2 pipeline?

sbedwell27 · 2024-05-22T14:56:59Z

Hi, just want to follow up on this. Should I be using the current PacBio workflow that you linked, or the PacBio Kinnex tutorial for fecal reads?

benjjneb · 2024-05-23T00:50:42Z

The current dada2 tutorial is the best place to start. The reproducible analyses associated with the initial DADA2+PacBio manuscript that you linked above is also very useful. The key difference relevant to your analyses is that current dada2 does not recommend using the derepFastq command explicitly, but instead you should pass the files into the learnErrors and dada functions. Those functions now perform dereplication on-the-fly, and avoid loading all samples into memory at once, which is probably what is causing your seg-fault error.

sbedwell27 · 2024-05-23T14:24:51Z

That makes sense, thank you! I tried this and it seems to be working so far!

sbedwell27 closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault core dumped on the derep step with Kinnex full length reads #1959

Segmentation fault core dumped on the derep step with Kinnex full length reads #1959

sbedwell27 commented May 18, 2024

sbedwell27 commented May 20, 2024

benjjneb commented May 21, 2024

sbedwell27 commented May 21, 2024

sbedwell27 commented May 22, 2024

benjjneb commented May 23, 2024

sbedwell27 commented May 23, 2024

Segmentation fault core dumped on the derep step with Kinnex full length reads #1959

Segmentation fault core dumped on the derep step with Kinnex full length reads #1959

Comments

sbedwell27 commented May 18, 2024

sbedwell27 commented May 20, 2024

benjjneb commented May 21, 2024

sbedwell27 commented May 21, 2024

sbedwell27 commented May 22, 2024

benjjneb commented May 23, 2024

sbedwell27 commented May 23, 2024