Missing reads when transitioning from Dada2 to phyloseq #1933

GingerMATE · 2024-04-23T12:20:51Z

Hello,

im encountering a similar issue to related to my sample reads. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Some of my samples loose over 50%.

Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:

sample | input | filtered | dadaF | dadaR | merged | nonchim | finalPercReadsKept % | reads physeq obj
ArcB1 | 82650 | 47038 | 45576 | 46265 | 43855 | 42770 | 51.7 | 42770
AuB1 | 90559 | 56066 | 54249 | 55233 | 52257 | 50278 | 55.5 | 32479
AuB2 | 100718 | 62615 | 59976 | 60933 | 57438 | 55009 | 54.6 | 53549
AuB3 | 137090 | 81540 | 80356 | 80983 | 79213 | 76150 | 55.5 | 27221

There is a large iscrepancy between the seqtab. nochim reads and the phyloseq object.

This is the code i use to create my seqtab.nochim file in the pipeline:

seqtab <- makeSequenceTable(mergers)

output_directory="/Analysis/Pipeline"

saveRDS(mergers, "/Analysis/RData/mergers.rds")

save.image(file.path(output_directory, "mergers.RData"))

#Inspect the merger data.frame from the first sample

head(mergers)

class(mergers)

dim(seqtab)

#Inspect distribution of sequence lengths

table(nchar(getSequences(seqtab)))

seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE) # minFoldParentOverAbundance = 8, multithread= parallel::detectCores(),

table(nchar(getSequences(seqtab.nochim)))

dim(seqtab.nochim)

sum(seqtab.nochim)/sum(seqtab)

saveRDS(seqtab, "/Analysis/RData/seqtab.rds")

saveRDS(seqtab.nochim, "/Analysis/RData/seqtab_nochim.rds")

And here is the code i use to create my phyloseq object:

Read sample Metadata and sample names

sample_metadata <- read.table("dna_metadata.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)
rownames(sample_metadata) <- sample_metadata$sample_id
head(sample_metadata)
Read asv and taxa files

asv_counts_table <- readRDS("seqtab_nochim.rds")
asv_seqs <- colnames(asv_counts_table)
asv_headers <- vector(dim(asv_counts_table)[2], mode="character")
for (i in 1:dim(asv_counts_table)[2]) {
asv_headers[i] <- paste("ASV", i, sep="_")
}
colnames(asv_counts_table) <- asv_headers
head(asv_counts_table)
class(asv_counts_table)
sample_id <- sample_metadata$sample_id
rownames(asv_counts_table) <- sample_id
load taxa file

taxa_with_species <- read.table("taxa_.csv", header=TRUE, row.names=1, sep=",", stringsAsFactors=FALSE)
head(taxa_with_species)
class(taxa_with_species)
taxa_with_species <- as.matrix(taxa_with_species)
Create phyloseq object

physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=FALSE),
tax_table(taxa_with_species), sample_data(sample_metadata))

physeq_ps137
readcount(physeq_ps137)

Any idea what could cause the reads to disappear? Or am i using the readcount function wrong? Thanks in advance!

The text was updated successfully, but these errors were encountered:

benjjneb · 2024-04-23T15:09:10Z

I'm not sure what is defining the readcount function, but it is not a phyloseq function.

If you are trying to get the total abundances (numbers of reads) in each sample from a phyloseq object, the appropriate function to use is sample_sums.

GingerMATE · 2024-04-24T09:47:08Z

I tried sample_sums to check the reads, but it gave me the same result.

Create phyloseq object

physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=TRUE), 
                         tax_table(taxa_with_species), sample_data(sample_metadata)) 

physeq_ps137
sample_sums(physeq_ps137)
readcount(physeq_ps137)

ArcB1 AuP1 AuB1 AuB2 AuB3
42770 45601 32479 53549 27221

The problematic step is between the seqtab.nochim and the physeq object. I assigend my taxa with the tryRC=TRUE argument in the assignTaxonomy function. Do you think that this could have an affect on the number of reads?

Here is the code i used to assign the taxonomy:

Assign taxonomy

unite.ref <- "DNA_results/Dada2/silva_nr99_v138.1_train_set.fa.gz"  # CHANGE ME to location on your machine

seqtab.nochim <- readRDS("DNA_results/seqtab_nochim.rds")

taxa <- assignTaxonomy(seqtab.nochim, unite.ref, multithread = parallel::detectCores(), tryRC = TRUE)

taxa <- addSpecies(taxa, "DNA_results/Dada2/silva_species_assignment_v138.1.fa.gz")

taxa.print <- taxa  # Removing sequence rownames for display only

rownames(taxa.print) <- NULL

head(taxa.print)

write.table (taxa, file ="DNA_results/Dada2/taxa_PS137_general_def_tryRCTRUE.csv", row.names = T, sep = "\t")

asv_seqs <- rownames(taxa_table)

asv_names <- character(nrow(taxa_table))

for (i in 1:nrow(taxa_table)) {
  asv_names[i] <- paste("ASV", i, sep="_")
}

rownames(taxa_table) <- asv_names

write.table(taxa_table, "DNA_results/Dada2/taxa_ps137_tryRCTRUE", sep="\t", quote=F, col.names=T)

benjjneb · 2024-04-29T16:15:00Z

I would suggest trying to construct the phyloseq object directly from the original sequence table and tax table from DADA2 (i.e. that contain the full DNA sequences as IDs), and checking if that solves the change in read counts issue.

It is simpler and less error prone to then switch to "ASV1" style IDs in the phyloseq object. Code for this is included in the DADA2 tutorial "Handoff to phyloseq" section:

dna <- Biostrings::DNAStringSet(taxa_names(ps))
names(dna) <- taxa_names(ps)
ps <- merge_phyloseq(ps, dna)
taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps)))
ps

benjjneb closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing reads when transitioning from Dada2 to phyloseq #1933

Missing reads when transitioning from Dada2 to phyloseq #1933

GingerMATE commented Apr 23, 2024 •

edited

benjjneb commented Apr 23, 2024

GingerMATE commented Apr 24, 2024 •

edited

benjjneb commented Apr 29, 2024

Missing reads when transitioning from Dada2 to phyloseq #1933

Missing reads when transitioning from Dada2 to phyloseq #1933

Comments

GingerMATE commented Apr 23, 2024 • edited

benjjneb commented Apr 23, 2024

GingerMATE commented Apr 24, 2024 • edited

Create phyloseq object

Assign taxonomy

benjjneb commented Apr 29, 2024

GingerMATE commented Apr 23, 2024 •

edited

GingerMATE commented Apr 24, 2024 •

edited