Skip to content

Visually compare distributions in data sets

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ataustin/flyover

Repository files navigation

Lifecycle: experimental Codecov test coverage

flyover

Painlessly generate high-level visual comparisons of distributions between groups or data sets.

Purpose

This package is for you if:

  • you are dealing with many variables whose distributions may differ in important ways among groups;
  • you have refactored a data process and want to ensure that the new data is acceptably similar to the old data;
  • you are pulling data on a regular schedule and need to monitor data drift;
  • you wish to examine data quality visually rather than relying on statistical tests and thresholds.

flyover provides utilities to quickly generate and organize plots of distributions split by a grouping variable.

Documentation

For detailed instructions, see the package website.

This package is largely powered by ggplot2 for building plots, and the awesome trelliscopejs by Ryan Hafen for providing a viewer to navigate them.

Install

devtools::install_github("ataustin/flyover")
library(flyover)

Note that the dependencies are large (especially trelliscopejs) and may take a long time to install.

Quickstart

A typical flyover workflow has the following steps:

  1. Combine different data sets into a single table.
  2. Apply a plotting function to the columns of the table.
  3. Build a display to navigate the plots.

If you are comparing distribtions of variables between an old data process and a new one, your workflow might look like this:

old_data <- read.csv("old-data.csv")
new_data <- read.csv("new-data.csv")

enlist_data(old_data, new_data, names = c("old data", "new data")) %>%
  build_plots(flyover_histogram) %>%
  build_display(display_name = "histograms", output_dir = "display-hist")

Get started with the articles in the documentation or jump right into the gallery of displays.

Tips

If you build a display from the console or as a batch job from an R script, you can point your browser directly at the output directory to render the display. You can also render the display inside an R Markdown document by making the call to build_display the last line of a code chunk (ensure the output directory specification is a relative file path, or you will get a knitting error).

Contributing and reporting problems

Please see contribute.md.