Skip to content

Iteratively randomly pooling scRNA-seq expressing a given gene from different numbers of cells and running DESeq2 with fdrtools correction to determine how many times which genes come out as enriched with said gene

Notifications You must be signed in to change notification settings

dgavr/Single_Cell_Iterative_Pooling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

scRNA-seq Iterative Pooling with DESeq2

Single cell transcriptomics (scRNA-seq) has in recent years become a popular method to assay heterogeneity of cells in a cell population and uncover hidden subpopulations. In contrast to bulk RNA-seq, scRNA-seq is thought to suffer from large heterogeneity between samples and "dropout events" (false zeros). Moreover, over-dispersion and technical variation are more evident as gene expression is inherently stochasatic in nature.

Many packages exist allowing to analyze and cluster scRNA-seq data produced using different methods. DESeq2 is a commonly used R package for identifying differentially expressed genes based on a model using the negative binomial distribution. scRNA-seq reads are known to follow a Poisson distribution, and therefore utilising DESeq2 and similar packages destined for bulk RNA-seq analysis is not recommended. To determine the effect of using such methods, an iterative method similar to a "bootstrap" was implemented using DESeq2.

The goal was to determine whether individual cells expressing the transcription factor FoxD3 in the zebrafish neural crest at 5-6ss somite stage carried out in the TSS lab and single cells expressing FoxD3 at 50% epiboly co-expressed similar sets of genes and to what extent these could be uncovered using DESeq2. Zebrafish scRNA-seq 50% epiboly dataset was kindly provided by R.Satija. The original study describing the 50% epiboly dataset was published in :

Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. and Regev, A., 2015. Spatial reconstruction of single-cell gene expression data. Nature biotechnology, 33(5), p.495.doi:10.1038/nbt.3192

A concatenated fasta file genome containing both the danRer10 sequence and ERCCs and a corresponding gtf file was produced (data/Zv10_ERCCs.gtf). A STAR (v.2.4.2) index was made using default parameters.

Paired end reads were mapped to the zebrafish genome (danRer10) using STAR using the following parameters:

STAR --genomeDir $GDIR --readFilesIn $A1 $A2 --runThreadN 8 --outFileNamePrefix $AN --outSAMstrandField intronMotif

Read counts were obtained using subread featureCounts (v.1.4.6) package using the following parameters:

featureCounts -T 16 -p -t exon -g gene_id -a Zv10_ERCCs.gtf -o $NAME

This repository contains the count tables obtained using featureCounts for both single cell datasets in data/

The raw reads from the neural crest dataset has been uploaded to NCBI's GEO under accession number GSE106676 that will be rendered public upon pear-reviewed publication of manuscript.

All of the cells in the neural crest single cell dataset were FACs-sorted based on FoxD3-Citrine reporter expression. The 50% epiboly dataset however contained all cells from this stage, many of which did not express FoxD3. Cells expressing FoxD3 transcript XXX => 1 FPKM were considered as FoxD3-expressing cells. 201 cells out of the total 743 expressed FoxD3. One cell expressed FoxD3 < 1 FPKM.

Raw reads were converted to FPKM values using R (inspired from this blog):

Low read counts and lack of reporoducibilty of replicate results in DESeq2's failure to estimation of the variance in the null distribution. Using the fdrtool R library this can be correct. For much better explanation of this link here

To run :

single_cell_iterative_pooling2.py -f STAR_mapped_zv10_single_cell_satija_counts -gi ENSDARG00000021032 -o FoxD3 -c 50  -l 1  -i 100 

This will use the featureCounts count table 'our_single_cell_with_ERCCs_zv10.counts.txt' and subselect all cells expressing FoxD3 (Ensembl geneid=ENSDARG00000021032). The limit to consider a cell as expressing FoxD3 is defined by -l of 1 FPKM. Using the -c 50 and -i 100, it will pool 50 cells together arbitrarily and repeat this 100 times.

Reference manuscript

This was carried out as part of the FoxD3 project carried out in the laboratory of T. Sauka-Spengler at the Weatherall Institute of Molecular Medicine at the University of Oxford, in the UK.

Preliminary manuscript for this project is available on bioRxiv: https://www.biorxiv.org/content/biorxiv/early/2017/11/22/213611.full.pdf

For additional information on project please see tsslab/foxd3 github repository.

About

Iteratively randomly pooling scRNA-seq expressing a given gene from different numbers of cells and running DESeq2 with fdrtools correction to determine how many times which genes come out as enriched with said gene

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages