Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs #1023

Open
gwaybio opened this issue Nov 5, 2020 · 3 comments

Comments

@gwaybio
Copy link
Contributor

gwaybio commented Nov 5, 2020

A lack of tools to precisely control gene expression has limited our ability to evaluate relationships between expression levels and phenotypes. Here, we describe an approach to titrate expression of human genes using CRISPR interference and series of single-guide RNAs (sgRNAs) with systematically modulated activities. We used large-scale measurements across multiple cell models to characterize activities of sgRNAs containing mismatches to their target sites and derived rules governing mismatched sgRNA activity using deep learning. These rules enabled us to synthesize a compact sgRNA library to titrate expression of ~2,400 genes essential for robust cell growth and to construct an in silico sgRNA library spanning the human genome. Staging cells along a continuum of gene expression levels combined with single-cell RNA-seq readout revealed sharp transitions in cellular behaviors at gene-specific expression thresholds. Our work provides a general tool to control gene expression, with applications ranging from tuning biochemical pathways to identifying suppressors for diseases of dysregulated gene expression.

https://doi.org/10.1038/s41587-019-0387-5

@gwaybio
Copy link
Contributor Author

gwaybio commented Nov 5, 2020

One section of a larger paper involves training a CNN on an "allelic series" of CRISPRi expression "titrations". That sentence is painful to read... in other words, in the assay, the authors systematically tinker with sgRNA sequences to toggle the impact of CRISPR knockdown on gene expression. This enables the authors to directly readout ground truth impact of modulating gene expression levels in a continuum between basal and knockout.

The input to the CNN are sgRNA sequences and their corresponding "relative activity". The relative activity is a single number representing a growth phenotype (essentially cell count). The authors train an ensemble of CNNs and evaluate their model on a heldout test set. They also validate their model by showing that it can also predict GFP expression in a CRISPRi allelic series targeting GFP as the "relative activity".

Model Details

Two convolutional layers, followed by a max pooling layer, then a fully connected layer to predict activity. The authors train 20 different models and inference on new data happens by taking the mean prediction of the 20 models.

Performance

The CNN ensemble outperforms a logistic regression model (r^2 = 0.65 vs. r^2 = 0.52)

Interpretation

The authors show that mismatch position (along the sgRNA construct) and mismatch type (e.g. A -> T) were the most informative features. GC content also important, and intermediate location between end and PAM seemed to be also informative.

Interesting Highlight

The authors also used their trained model to impute the sgRNA constructs that would most likely result in activity between a certain level. This helped with designing a more compact sgRNA library 🤯

@gwaybio
Copy link
Contributor Author

gwaybio commented Nov 5, 2020

This paper is a good example of a trend where Deep Learning is becoming more integrated into primarily assay development/molecular biology efforts

@gwaybio
Copy link
Contributor Author

gwaybio commented Nov 5, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant