Skip to content

mwodring/Annotatr_Basic

Repository files navigation

Quickly plot a coverage graph annotations from Geneious with R

Overview

This script is a utility to plot annotations & coverage with R.

Inputs

The script uses three .csv files and a .yaml file as input. They're simple to utilise and can be edited either in Excel etc. or notepad.

plot_anot.yaml

This file must be called plot_anot.yaml.

Files

To configure the script, simply type the name of the .csv files.

phob_csv is optional. Enter "" to skip phobicity.

bg_file is the .bedGraph.gz file with your coverage info. This file should be a map of reads to the contig.

If the file is in the same folder as the script, just name the file. If it's elsewhere, use the full filepath.

On Windows, that might be C:/My_Stuff/anot_csv.csv for example.

You can name these files anything you want as long as it's a valid .csv.

Options

contig_length is the length of the entire contig. You don't need to supply the contig itself.

contig_name must match the contig's name in the bedGraph coverage file.

plot_name is the name you've given to the contig (which may be the same), and also the file name. (Outputs as .jpeg)

Height and width set the size of the .jpeg file.

In the sample plot:

anot_csv: plot_anot_template.csv
orf_csv: orf_anot_template.csv
phob_csv: phob_anot_template.csv
bg_file: SAMPLE.bedGraph.gz
contig_length: 1300
contig_name: Contig
plot_name: Annotation Plot
height: 800
width: 1600

ORFs

orf_plot_template.csv is an example file of the input this script takes for ORFs.

Sometimes an ORF and a product are the same thing. In some viruses one ORF can have multiple products due to splicing.

ORFs are mandatory. Give the name, start, and end for each. No need for double quotes.

ORF starts and ends should be in nucleotides (DNA/RNA) before translation.

name start end
Polyprotein 150 800

Annotations

anot_plot_template.csv shows an example file of the input this script takes for annotations.

This is generated by Geneious if you export annotations as a table from the text menu.

These should be in amino acid count, as it relates to the protein.

Don't leave any names empty. Put a "" instead. Geneious may output with empty names if you don't name an annotation.

Name Type Minimum Maximum Length Direction
DOMAIN domain 110 199 89 forward
"" active site 200 201 1 none
BINDING SITE binding site 202 210 8 forward

Phobicity

OPTIONAL: a .csv file of phobicity. It must be in the following format.

NC for non-cytoplasmic domain. TMR for trans-membrane region. C for cytoplasmic domain.

type start end
NC 33 66
TMR 67 80
C 81 120

This colour-codes the regions. You may leave them as annotations if that's what you prefer.

Running

The simplest way to run this script is to place it in a folder with the files above and your bedGraph file.

Open R (not RStudio) and set the working directory to this script's directory.

For example, if my file was C:\My_Stuff\Scripts\plot_annotations_yaml.r:

setwd("C:/My_Stuff/Scripts/")

Under file click 'source script'. Navigate to the script and double click. You should output the desired .jpeg in this folder.

If you use Jupyter Lab or RStudio, or code in R using the command line, you will already know how best to run this script on your system anyway.

About

Standalone script plot ORFs/annotations/coverage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages