Skip to content

R package to plot Human Mitochondrial Variants from Dataframes, VCF or JSON files

License

Notifications You must be signed in to change notification settings

robertopreste/mitovizR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mitovizR

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Build Status codecov

Plot variants on the human mitochondrial genome. Currently supports plotting variants contained in dataframes, VCF or JSON files.


Overview

mitovizR is a simple R package to plot human mitochondrial variants on a graphical representation of the human mitochondrial genome. It currently supports plotting variants stored in a dataframe or a VCF or JSON file, although other options are being developed (if you want to contribute, you're welcome!).

Features

  • Plot variants from a dataframe
  • Plot variants from a VCF file
    • with a single sample
    • with multiple samples (multiple facets will be produced in the same plot)
    • showing Heteroplasmic Fraction of each variant
  • Plot variants from a JSON file
    • from a list of variants
    • from a list with a dictionary for each variant

Installation

mitovizR can be installed in R from GitHub using devtools:

devtools::install_github("robertopreste/mitovizR")

It is suggested to also build the package vignette when installing mitovizR, where all the functionalities are better explained:

devtools::install_github("robertopreste/mitovizR", build_vignettes = TRUE)

Official installation from CRAN/Bioconductor coming soon!

Usage

First of all, load the mitovizR package:

library(mitovizR)

The mitovizR package offers three main functions to plot variants from three different sources:

plot_df()    # to plot variants from a dataframe 
plot_vcf()   # to plot variants from a VCF file 
plot_json()  # to plot variants from a JSON file 

Further details about their usage and options may be found in the package vignette, which can be built using devtools::install_github("robertopreste/mitovizR", build_vignettes = TRUE).

Under construction!

Plot variants from a dataframe

The simplest example is when you already have a dataframe with a set of mitochondrial variants.

POS REF ALT SAMPLE GT HF
750 A G HG00420 0/1 0.998
1438 A G HG00420 0/1 0.232
2706 A G HG00420 0/1 0.015
3106 CN C HG00420 0/1 0.66
4216 T C HG00420 1 1.0
7028 C T HG00420 0/1 0.420
8935 C T HG00420 0/1 0.899
9389 A G HG00420 1 1.0
9899 T C HG00420 0/1 0.670
11251 A G HG00420 0/1 0.999
11719 G A HG00420 0/1 0.008
12633 C A HG00420 0/1 0.998
13368 G A HG00420 1 1.0
14766 C T HG00420 0/1 0.398
15326 A G HG00420 0/1 0.804
16294 C T HG00420 0/1 0.008

In this case, a call to plot_df() will plot all mitochondrial variants.

plot_df(sample_df)

It is possible to show a label on each variant with its position, reference and alternate allele, using the show_var_labels option. In this case, by default the plot_df() function will look for columns named POS, REF and ALT, containing respectively variant positions, reference alleles and alternate alleles; you can specify different column names using respectively the pos_col, ref_col and alt_col options.

plot_df(sample_df, show_var_labels = TRUE)
# specify the variant positions, reference and alternate alleles column names
plot_df(sample_df, show_var_labels = TRUE, 
        pos_col = "POS", ref_col = "REF", alt_col = "ALT")

Variants are shown according to their heteroplasmic fraction (HF), plotting variants with HF = 1.0 on the outer border of the mitochondrial circle, those with HF = 0.0 on the inner border and all the others according to their actual HF value. This is done automatically if a column with heteroplasmic fraction values is present (by default HF, if not set differently using the hf_col option); otherwise, variants will simply be shown in the middle of the circular plot (as if they all had HF = 0.5).

plot_df() deal with a dataframe with multiple samples easily, creating a separate plot for each sample listed in the SAMPLE column (this default can be changed using the sample_col option).

plot_df(sample_multi_df)
# specify the sample column name 
plot_df(sample_multi_df, sample_col = "sample_column")

By default, the plot will be returned and shown; if you want to save the plot to a file, just use the save_plot option, which will save the current plot to ./mitoviz_plot.png. You can specify a custom output path and filename using the save_to option.

plot_df(sample_df, save_plot = TRUE)
# specify a custom output path
plot_df(sample_df, save_plot = TRUE, save_to = "../my_dir/my_plot.png")

Plot variants from a VCF file

The plot_vcf() function allows to plot human mitochondrial variants contained in a VCF file.

plot_vcf("sample_vcf.vcf")

Multiple plots are automatically created from a multi-sample VCF file.

plot_vcf("sample_multi.vcf")

Plot variants from a JSON file

Some tools will output mitochondrial variants in a JSON-formatted file; in this case, the plot_json() function is what you need. JSON files can usually be in two different formats:

  • vector format, where variant positions are simply stored in a vector
  • dataframe format, where each variant is stored in its own entry

The plot_json() function can handle both cases, using a different argument for its json_format option.

Vector-formatted JSON files

An example of a vector-formatted JSON file is the following:

# content of json_vector.json
["420", "1000", "3000", "5000", "10000"]

Using json_format = "vector" allows to plot variants from this file.

plot_json("json_vector.json", json_format = "vector")

Dataframe-formatted JSON files

An example of a dataframe-formatted JSON file is the following:

# content of json_dataframe.json
[
  {"feat1": "snp", "position": 420},
  {"feat1": "snp", "position": 1000},
  {"feat1": "snp", "position": 3000},
  {"feat1": "snp", "position": 5000},
  {"feat1": "snp", "position": 10000}
]

In this case, just set json_format = "dataframe". By default, the plot_json() function will look for a POS column with variant positions, but you can specify a different column name using the pos_col option.

plot_json("json_dataframe.json", json_format = "dataframe", pos_col = "position")

Help

If you found a bug, or want to suggest an improvement, please feel free to open an issue.

About

R package to plot Human Mitochondrial Variants from Dataframes, VCF or JSON files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages