Automatic Data Explorer

An R package to explore and quality check data. Contains a variety of useful functions which enable automatic checking of data quality, factors and numeric data as well as correlations.

targetCorrletions()
ggdensity()
gghistogram()
SummaryStatsCat()
SummaryStatsNum()
autoMarkdown()

Using targetCorrelations

To get started use a data frame and detail the column that you want to get target correlations for:

install.packages("purrr")
library(purrr)

data <- data.frame(A = rnorm(50,0,1),
                   B = runif(50,10,20),
                   C = seq(1,50,1),
                   D = rep(LETTERS[1:5], 10))

targetCorrelations(data, "B")

This should give a similar report to:

         C          A 
0.40549008 0.01356416

Using autoMarkdown

The autoMarkdown() function can be used to automatically generate R Markdown files directly from one or more R scripts. The idea is to take the focus away from thinking about your Markdown styling when doing the most important part of data science, the actual expoloration and analysis.

The function requires that the R script has some formatting; the code that you wish to be incorporated into a code chunk must be separated with a divider, e.g.

#' # Summary
#' This is the summary of the mtcars dataset

#.#
summary(mtcars)
#.#

#' ## Histogram of mpg
#' This is a histogram of the mpg variable

#.#
autoHistogramPlot(mtcars, mpg, colour = "black", fill = "blue")
#.#

There are two things to note in this example

#.# are the dividers and mean that the code within should be treated as a code chunk
#' autoMarkdown recognises these as Roxygen comments and treats them accordingly

Say that we have saved the above in an R script called mtcars.R, we can now write this as R Markdown to an existing mtcars.Rmd file with

autoMarkdown("mtcars.R", "mtcars.Rmd")

Most projects will have multiple separate scripts; perhaps detailing different stages of the data science life-cycle. This makes our work flow much easier to follow and keeps code neat and tidy. However, when it comes to reporting it is most likely that we want just one report. If we have multiple scripts these can all be written to the same .Rmd file with

autoMarkdown(c("DataExploration.R", "DataCleaning.R", "Modelling.R"), "ProjectReport.Rmd", overwrite = TRUE)

Note the overwrite = TRUE argument. This specification will mean that any existing markdown in the .Rmd file will automatically be written over. This is useful in most circumstances but could potentially be dangerous if you specify the wrong .Rmd file, so use with caution.

The default setting is to create code chunks that are "quiet", that is they will only display the results of the code, not the code itself or any messages generated by it. Further development may include an option to specify a code chunk that also displays the code itself.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
R		R
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
AutoExploreR.Rproj		AutoExploreR.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
test.R		test.R
test.Rmd		test.Rmd
utils.R.Rmd		utils.R.Rmd

License

elastacloud/automatic-data-explorer

Folders and files

Latest commit

History

Repository files navigation

Automatic Data Explorer

Using targetCorrelations

Using autoMarkdown

About

Topics

Resources

License

Stars

Watchers

Forks

Languages