GitHub - ilibarra/primer_design_covid19: Python workflow to design COVID-19 primers and assess their specificity and amplicon RNA secondary structure stability.

Screening of primers for COVID-19 with genome specificity and amplicons with stable single-stranded RNA structure

By Anibal Arce and Ignacio Ibarra

Motivation

Sequence-specific isothermal amplification of targets from viral samples is essential for implementing low-cost diagnostic tools based on nucleic acid detection.
Design and selection of primer combinations for this propuse can be automated considering the optimal parameters for a particular methodology, such as NASBA isothermal amplification of RNA.
Analysis of the secondary structure of the amplicon generated, aids in the shortlisting of candidates for downstreams detection tools such as Toeholds RNA Sensors.

Solution

This Python workflow:

Selects desirable primers for experiments targeting in the COVID-19 genome. Unwanted specificity with other viral genomes is checked and reported.
Amplicon products obtained from primer pairs are checked for RNA secondary structure stability (reported as minimum free energy and Z-scores).

Workflow steps

Primers are scanned against COVID-19's GenBank entry (--fastaid to modify with custom fasta), and filtered by desirable %GC content rules, Tm and local alignment between primers (duplex formation).
Primers are scanned against 7 viral genomes selected to reduce amplification of untargeted viral genomes:
- RSV, Ketapneumovirus, Parainfluenza 4a, Adenovirus E, Influenza B, Influenza A.
Primers are grouped into pairs and filtered by amplicon lengths.
Predicted RNA amplicons are assessed for RNA secondary structures using LinearFold (Huang et al. 2019). Z-scores are calculated using a regression model that models the free energy versus amplicon length. This allows comparing free energies of long and short amplicons.

Installation and running (typical time: less than 5 minutes)

git clone primer_design_linearfold_covid19.git
cd primer_design_linearfold_covid19

Environment requirements

Python 3 https://www.python.org/
Data Science packages for Python: pandas numpy seaborn scikit-learn
LinearFold must be installed to assess free energy of RNA secondary structures. Please include to $PATH or define path with --linearfold.
MUSCLE : Necessary to generate MSA files between reference FASTA and included aln for virus of interest.
New (May 2020): tags now must be given as a multifasta (default input/tags/tags.fa)

Input requirements

Please download the following files and unzip them into the input directory (MSA, and tags multifasta).

DATA LINK

Execution examples

python generate_primer_pairs.py --help # see help (fasta ID, gc content filter, min-max primer/amplicon lengths, etc.)
# LOAD TESTS
python generate_primer_pairs.py --ntest 100 --overwrite2 # test only with the first 100 primer pairs (run in 5 minutes).
python generate_primer_pairs.py --ntest 1000 --overwrite2 # test only with the first 1000 pairs.
# FULL RUN
python generate_primer_pairs.py --overwrite2 # full run (checking RNA secondary structures).
python generate_primer_pairs.py --checkothers --overwrite1 --overwrite2 # full run (checking against background viral genomes).

Output

An Excel file called output/FASTA_ID_pairs.xlsx contains details for all shortlisted primer pairs and amplicons selected.
Check Z-scores for RNA-stability of primers and amplicons. More negative values are correlated with more stable RNA secondary structures.
- You can visualize 2D-stable amplicon sequences using NUPACK.

Running time

Around 5-15 minutes for load tests between 100 and 2000 (primer pairs and their respective amplicons).
~90-120 minutes for full execution (one CPU, default parameters, verification against other genomes and RNA secondary structure assessment).
Adding more background genomes increases running time linearly.
Reducing the primer lengths increases exponentially the running time.

Misc

More background viral genomes can be added manually in input/other_viruses as fasta files. Please add also an entry to the file names.tsv with a keyword for that name.
You can replace the FASTA_ID in input to genomes of your interest (not only COVID-19).

Feedback, errors, or further questions

Report in Issues.
You can also Send an e-mail to Anibal Arce (aaarce@uc.cl) or Ignacio Ibarra (ignacio.ibarra@helmholtz-muenchen.de).

Funding

OpenCovid19 Initiative - Just One Giant Lab. More information at https://app.jogl.io/project/188

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
about		about
input		input
output		output
utilities		utilities
README.md		README.md
generate_primer_pairs.py		generate_primer_pairs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about

about

input

input

output

output

utilities

utilities

README.md

README.md

generate_primer_pairs.py

generate_primer_pairs.py

Repository files navigation

Screening of primers for COVID-19 with genome specificity and amplicons with stable single-stranded RNA structure

Motivation

Solution

Workflow steps

Installation and running (typical time: less than 5 minutes)

Environment requirements

Input requirements

Execution examples

Output

Running time

Misc

Feedback, errors, or further questions

Funding

About

Releases

Packages

Contributors 2

Languages

ilibarra/primer_design_covid19

Folders and files

Latest commit

History

Repository files navigation

Screening of primers for COVID-19 with genome specificity and amplicons with stable single-stranded RNA structure

Motivation

Solution

Workflow steps

Installation and running (typical time: less than 5 minutes)

Environment requirements

Input requirements

Execution examples

Output

Running time

Misc

Feedback, errors, or further questions

Funding

About

Resources

Stars

Watchers

Forks

Languages