This script is a utility to plot annotations & coverage with R.
The script uses three .csv files and a .yaml file as input. They're simple to utilise and can be edited either in Excel etc. or notepad.
This file must be called plot_anot.yaml.
To configure the script, simply type the name of the .csv files.
phob_csv is optional. Enter "" to skip phobicity.
bg_file is the .bedGraph.gz file with your coverage info. This file should be a map of reads to the contig.
If the file is in the same folder as the script, just name the file. If it's elsewhere, use the full filepath.
On Windows, that might be C:/My_Stuff/anot_csv.csv for example.
You can name these files anything you want as long as it's a valid .csv.
contig_length is the length of the entire contig. You don't need to supply the contig itself.
contig_name must match the contig's name in the bedGraph coverage file.
plot_name is the name you've given to the contig (which may be the same), and also the file name. (Outputs as .jpeg)
Height and width set the size of the .jpeg file.
In the sample plot:
anot_csv: plot_anot_template.csv
orf_csv: orf_anot_template.csv
phob_csv: phob_anot_template.csv
bg_file: SAMPLE.bedGraph.gz
contig_length: 1300
contig_name: Contig
plot_name: Annotation Plot
height: 800
width: 1600
orf_plot_template.csv is an example file of the input this script takes for ORFs.
Sometimes an ORF and a product are the same thing. In some viruses one ORF can have multiple products due to splicing.
ORFs are mandatory. Give the name, start, and end for each. No need for double quotes.
ORF starts and ends should be in nucleotides (DNA/RNA) before translation.
name | start | end |
---|---|---|
Polyprotein | 150 | 800 |
anot_plot_template.csv shows an example file of the input this script takes for annotations.
This is generated by Geneious if you export annotations as a table from the text menu.
These should be in amino acid count, as it relates to the protein.
Don't leave any names empty. Put a "" instead. Geneious may output with empty names if you don't name an annotation.
Name | Type | Minimum | Maximum | Length | Direction |
---|---|---|---|---|---|
DOMAIN | domain | 110 | 199 | 89 | forward |
"" | active site | 200 | 201 | 1 | none |
BINDING SITE | binding site | 202 | 210 | 8 | forward |
OPTIONAL: a .csv file of phobicity. It must be in the following format.
NC for non-cytoplasmic domain. TMR for trans-membrane region. C for cytoplasmic domain.
type | start | end |
---|---|---|
NC | 33 | 66 |
TMR | 67 | 80 |
C | 81 | 120 |
This colour-codes the regions. You may leave them as annotations if that's what you prefer.
The simplest way to run this script is to place it in a folder with the files above and your bedGraph file.
Open R (not RStudio) and set the working directory to this script's directory.
For example, if my file was C:\My_Stuff\Scripts\plot_annotations_yaml.r:
setwd("C:/My_Stuff/Scripts/")
Under file click 'source script'. Navigate to the script and double click. You should output the desired .jpeg in this folder.
If you use Jupyter Lab or RStudio, or code in R using the command line, you will already know how best to run this script on your system anyway.