visuals.Rmd

# Visualization with drake {#visuals}

```{r, message = FALSE, warning = FALSE, echo = FALSE, include = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(
  drake_make_menu = FALSE,
  drake_clean_menu = FALSE,
  warnPartialMatchArgs = FALSE,
  crayon.enabled = FALSE,
  readr.show_progress = FALSE,
  tidyverse.quiet = TRUE
)
```

```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(visNetwork)
```

Data analysis projects have complicated networks of dependencies, and `drake` can help you visualize them with `vis_drake_graph()`, `sankey_drake_graph()`, and `drake_ggraph()` (note the two g's).

## Plotting plans

Except for `drake` 7.7.0 and below, you can simply `plot()` the plan to show the targets and their dependency relationships.

```{r, eval = TRUE}
library(drake)
# from https://github.com/wlandau/drake-examples/tree/main/mtcars
load_mtcars_example()

my_plan

plot(my_plan)
```


### `vis_drake_graph()`

Powered by [`visNetwork`](http://datastorm-open.github.io/visNetwork/). Colors represent target status, and shapes represent data type. These graphs are interactive, so you can click, drag, zoom, and and pan to adjust the size and position. Double-click on nodes to contract neighborhoods into clusters or expand them back out again. If you hover over a node, you will see text in a tooltip showing the first few lines of

- The command of a target, or
- The body of an imported function, or
- The content of an imported text file.

```{r, eval = TRUE}
vis_drake_graph(my_plan)
```

To save this interactive widget for later, just supply the name of an HTML file.

```{r, eval = FALSE}
vis_drake_graph(my_plan, file = "graph.html")
```

To save a static image file, supply a file name that ends in `".png"`, `".pdf"`, `".jpeg"`, or `".jpg"`.

```{r, eval = FALSE}
vis_drake_graph(my_plan, file = "graph.png")
```

### `sankey_drake_graph()`

These interactive [`networkD3`](https://github.com/christophergandrud/networkD3) [Sankey diagrams](https://en.wikipedia.org/wiki/Sankey_diagram) have more nuance: the height of each node is proportional to its number of connections. Nodes with many incoming connnections tend to fall out of date more often, and nodes with many outgoing connections can invalidate bigger chunks of the downstream pipeline.

```{r, eval = TRUE}
sankey_drake_graph(my_plan)
```

Saving the graphs is the same as before.

```{r, eval = FALSE}
sankey_drake_graph(my_plan, file = "graph.html") # Interactive HTML widget
sankey_drake_graph(my_plan, file = "graph.png")  # Static image file
```

Unfortunately, a legend is [not yet available for Sankey diagrams](https://github.com/ropensci/drake/pull/467), but `drake` exposes a separate legend for the colors and shapes.

```{r, eval = TRUE}
library(visNetwork)
legend_nodes()
visNetwork(nodes = legend_nodes())
```

### `drake_ggraph()`

`drake_ggraph()` can handle larger workflows than the other graphing functions. If your project has thousands of targets and `vis_drake_graph()`/`sankey_drake_graph()` does not render properly, consider `drake_ggraph()`. Powered by [`ggraph`](https://github.com/thomasp85/ggraph), `drake_ggraph()`s are static [`ggplot2`](https://github.com/tidyverse/ggplot2) objects, and you can save them with `ggsave()`.

```{r, eval = TRUE}
drake_ggraph(my_plan)
```

### `text_drake_graph()`

If you are running R in a terminal without [X Window](https://en.wikipedia.org/wiki/X_Window_System) support, the usual visualizations will show up interactively in your session. Here, you can use `text_drake_graph()` to see a text display in your terminal window. Terminal colors are deactivated in this manual, but you will see color in your console.

```{r, eval = TRUE}
# Use nchar = 0 or nchar = 1 for better results.
# The color display is better in your own terminal.
text_drake_graph(my_plan, nchar = 3)
```


## Underlying graph data: node and edge data frames

`drake_graph_info()` is used behind the scenes in `vis_drake_graph()`, `sankey_drake_graph()`, and `drake_ggraph()` to get the graph information ready for rendering. To save time, you can call `drake_graph_info()` to get these internals and then call `render_drake_graph()`, `render_sankey_drake_graph()`, or  `render_drake_ggraph()`.

```{r, eval = TRUE}
str(drake_graph_info(my_plan))
```

## Visualizing target status

`drake`'s visuals tell you which targets are up to date and which are outdated.

```{r, eval = TRUE}
make(my_plan, verbose = 0L)
outdated(my_plan)

sankey_drake_graph(my_plan)
```

When you change a dependency, some targets fall out of date (black nodes).

```{r, eval = TRUE}
reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
sankey_drake_graph(my_plan)
```

## Subgraphs

Graphs can grow enormous for serious projects, so there are multiple ways to focus on a manageable subgraph. The most brute-force way is to just pick a manual `subset` of nodes. However, with the `subset` argument, the graphing functions can drop intermediate nodes and edges.

```{r, eval = TRUE}
vis_drake_graph(
  my_plan,
  subset = c("regression2_small", "large")
)
```

The rest of the subgraph functionality preserves connectedness. Use `targets_only` to ignore the imports.

```{r, eval = TRUE}
vis_drake_graph(my_plan, targets_only = TRUE)
```

Similarly, you can just show downstream nodes.

```{r, eval = TRUE}
vis_drake_graph(my_plan, from = c("regression2_small", "regression2_large"))
```

Or upstream ones.

```{r, eval = TRUE}
vis_drake_graph(my_plan, from = "small", mode = "in")
```

In fact, let us just take a small neighborhood around a target in both directions. For the graph below, given order is 1, but all the custom `file_out()` output files of the neighborhood's targets appear as well. This ensures consistent behavior between `show_output_files = TRUE` and `show_output_files = FALSE` (more on that later).

```{r, eval = TRUE}
vis_drake_graph(my_plan, from = "small", mode = "all", order = 1)
```

## Control the `vis_drake_graph()` legend.

Some arguments to `vis_drake_graph()` control the legend.

```{r, eval = TRUE}
vis_drake_graph(my_plan, full_legend = TRUE, ncol_legend = 2)
```

To remove the legend altogether, set the `ncol_legend` argument to `0`.

```{r, eval = TRUE}
vis_drake_graph(my_plan, ncol_legend = 0)
```

## Clusters

With the `group` and `clusters` arguments to the graphing functions, you can condense nodes into clusters. This is handy for workflows with lots of targets. Take the schools scenario from the [`drake` plan guide](#plans). Our plan was generated with `drake_plan(trace = TRUE)`, so it has wildcard columns that group nodes into natural clusters already. You can manually add such columns if you wish.

```{r, eval = TRUE}
# Visit https://books.ropensci.org/drake/static.html
# to learn about the syntax with target(transform = ...).
plan <- drake_plan(
  school = target(
    get_school_data(id),
    transform = map(id = c(1, 2, 3))
  ),
  credits = target(
    fun(school),
    transform = cross(
      school,
      fun = c(check_credit_hours, check_students, check_graduations)
    )
  ),
  public_funds_school = target(
    command = check_public_funding(school),
    transform = map(school = c(school_1, school_2))
  ),
  trace = TRUE
)
plan
```

Ordinarily, the workflow graph gives a separate node to each individual import object or target.

```{r, echo = FALSE}
check_credit_hours <- check_students <- check_graduations <-
  check_public_funding <- get_school_data <- function(){}
```

```{r, eval = TRUE}
vis_drake_graph(plan)
```

For large projects with hundreds of nodes, this can get quite cumbersome. But here, we can choose a wildcard column (or any other column in the plan, even custom columns) to condense nodes into natural clusters. For the `group` argument to the graphing functions, choose the name of a column in `plan` or a column you know will be in `drake_graph_info(my_plan)$nodes`. Then for `clusters`, choose the values in your `group` column that correspond to nodes you want to bunch together. The new graph is not as cumbersome.

```{r, eval = TRUE}
vis_drake_graph(plan,
  group = "school",
  clusters = c("school_1", "school_2", "school_3")
)
```

As previously mentioned, you can group on any column in `drake_graph_info(my_plan)$nodes`. Let's return to the `mtcars` project for demonstration.

```{r, eval = TRUE}
vis_drake_graph(my_plan)
```

Let's condense all the imports into one node and all the up-to-date targets into another. That way, the outdated targets stand out.

```{r, eval = TRUE}
vis_drake_graph(
  my_plan,
  group = "status",
  clusters = c("imported", "up to date")
)
```


## Output files

`drake` can reproducibly track multiple output files per target and show them in the graph.

```{r, eval = TRUE}
plan <- drake_plan(
  target1 = {
    file.copy(file_in("in1.txt"), file_out("out1.txt"))
    file.copy(file_in("in2.txt"), file_out("out2.txt"))
  },
  target2 = {
    file.copy(file_in("out1.txt"), file_out("out3.txt"))
    file.copy(file_in("out2.txt"), file_out("out4.txt"))
  }
)
writeLines("in1", "in1.txt")
writeLines("in2", "in2.txt")
make(plan)

writeLines("abcdefg", "out3.txt")
vis_drake_graph(plan, targets_only = TRUE)
```

If your graph is too busy, you can hide the output files with `show_output_files = FALSE`.

```{r, eval = TRUE}
vis_drake_graph(plan, show_output_files = FALSE, targets_only = TRUE)
```


## Node Selection

*(Supported in drake > 7.7.0 only)*

First, we define our plan, adding a custom column named "link".

```{r, eval = TRUE}
mtcars_link <-
  "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html"

plan <- drake_plan(
  mtc = target(
    mtcars,
    link = !!mtcars_link
  ),
  mtc2 = target(
    mtc,
    link = !!mtcars_link
  ),
  mtc3 = target(
    modify_mtc2(mtc2, number),
    transform = map(number = !!c(1:3), .tag_in = cluster_id),
    link = !!mtcars_link
  ),
  trace = TRUE
)
```

```{r, eval = TRUE}
unique_stems <- unique(plan$cluster_id)
```


### Perform the default action on select

By supplying `vis_drake_graph(on_select = TRUE, on_select_col = "my_column")`,
treats the values in the column named `"my_column"` as hyperlinks. Click on a node in the graph to navigate to the corresponding link in your browser.

```{r, eval = TRUE}
vis_drake_graph(
  plan,
  clusters = unique_stems,
  group = "cluster_id",
  on_select_col = "link",
  on_select = TRUE
)
```

### Perform no action on select

No action will be taken if any of the following are given to
`vis_drake_graph()`:

- `on_select = NULL`,
- `on_select = FALSE`,
- `on_select_col = NULL`

This is the default behaviour.

```{r, eval = TRUE}
vis_drake_graph(
  my_plan,
  clusters = unique_stems,
  group = "cluster_id",
  on_select_col = "link",
  on_select = NULL
)
```


### Customize the onSelect event behaviour

What if we instead wanted the browser to display an alert when a node is
clicked?

```{r, eval = TRUE}
alert_behaviour <- function(){
  js <- "
  function(props) {
    alert('selected node with on_select_col: \\r\\n' +
            this.body.data.nodes.get(props.nodes[0]).on_select_col);
  }"
}

vis_drake_graph(
  my_plan,
  on_select_col = "link",
  on_select = alert_behaviour()
)
```

## Enhanced interactivity

For enhanced interactivity, including custom interactive target documentation, see the [`mandrake`](https://mstr3336.github.io/mandrake) R package. For a taste of the functionality, visit [this vignette page](https://mstr3336.github.io/mandrake/articles/Test_Usecase.html#graph) and click the `mtcars` node in the graph.