-
Notifications
You must be signed in to change notification settings - Fork 26
/
visuals.Rmd
371 lines (272 loc) · 11.4 KB
/
visuals.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
# Visualization with drake {#visuals}
```{r, message = FALSE, warning = FALSE, echo = FALSE, include = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(
drake_make_menu = FALSE,
drake_clean_menu = FALSE,
warnPartialMatchArgs = FALSE,
crayon.enabled = FALSE,
readr.show_progress = FALSE,
tidyverse.quiet = TRUE
)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(visNetwork)
```
Data analysis projects have complicated networks of dependencies, and `drake` can help you visualize them with `vis_drake_graph()`, `sankey_drake_graph()`, and `drake_ggraph()` (note the two g's).
## Plotting plans
Except for `drake` 7.7.0 and below, you can simply `plot()` the plan to show the targets and their dependency relationships.
```{r, eval = TRUE}
library(drake)
# from https://github.com/wlandau/drake-examples/tree/main/mtcars
load_mtcars_example()
my_plan
plot(my_plan)
```
### `vis_drake_graph()`
Powered by [`visNetwork`](http://datastorm-open.github.io/visNetwork/). Colors represent target status, and shapes represent data type. These graphs are interactive, so you can click, drag, zoom, and and pan to adjust the size and position. Double-click on nodes to contract neighborhoods into clusters or expand them back out again. If you hover over a node, you will see text in a tooltip showing the first few lines of
- The command of a target, or
- The body of an imported function, or
- The content of an imported text file.
```{r, eval = TRUE}
vis_drake_graph(my_plan)
```
To save this interactive widget for later, just supply the name of an HTML file.
```{r, eval = FALSE}
vis_drake_graph(my_plan, file = "graph.html")
```
To save a static image file, supply a file name that ends in `".png"`, `".pdf"`, `".jpeg"`, or `".jpg"`.
```{r, eval = FALSE}
vis_drake_graph(my_plan, file = "graph.png")
```
### `sankey_drake_graph()`
These interactive [`networkD3`](https://github.com/christophergandrud/networkD3) [Sankey diagrams](https://en.wikipedia.org/wiki/Sankey_diagram) have more nuance: the height of each node is proportional to its number of connections. Nodes with many incoming connnections tend to fall out of date more often, and nodes with many outgoing connections can invalidate bigger chunks of the downstream pipeline.
```{r, eval = TRUE}
sankey_drake_graph(my_plan)
```
Saving the graphs is the same as before.
```{r, eval = FALSE}
sankey_drake_graph(my_plan, file = "graph.html") # Interactive HTML widget
sankey_drake_graph(my_plan, file = "graph.png") # Static image file
```
Unfortunately, a legend is [not yet available for Sankey diagrams](https://github.com/ropensci/drake/pull/467), but `drake` exposes a separate legend for the colors and shapes.
```{r, eval = TRUE}
library(visNetwork)
legend_nodes()
visNetwork(nodes = legend_nodes())
```
### `drake_ggraph()`
`drake_ggraph()` can handle larger workflows than the other graphing functions. If your project has thousands of targets and `vis_drake_graph()`/`sankey_drake_graph()` does not render properly, consider `drake_ggraph()`. Powered by [`ggraph`](https://github.com/thomasp85/ggraph), `drake_ggraph()`s are static [`ggplot2`](https://github.com/tidyverse/ggplot2) objects, and you can save them with `ggsave()`.
```{r, eval = TRUE}
drake_ggraph(my_plan)
```
### `text_drake_graph()`
If you are running R in a terminal without [X Window](https://en.wikipedia.org/wiki/X_Window_System) support, the usual visualizations will show up interactively in your session. Here, you can use `text_drake_graph()` to see a text display in your terminal window. Terminal colors are deactivated in this manual, but you will see color in your console.
```{r, eval = TRUE}
# Use nchar = 0 or nchar = 1 for better results.
# The color display is better in your own terminal.
text_drake_graph(my_plan, nchar = 3)
```
## Underlying graph data: node and edge data frames
`drake_graph_info()` is used behind the scenes in `vis_drake_graph()`, `sankey_drake_graph()`, and `drake_ggraph()` to get the graph information ready for rendering. To save time, you can call `drake_graph_info()` to get these internals and then call `render_drake_graph()`, `render_sankey_drake_graph()`, or `render_drake_ggraph()`.
```{r, eval = TRUE}
str(drake_graph_info(my_plan))
```
## Visualizing target status
`drake`'s visuals tell you which targets are up to date and which are outdated.
```{r, eval = TRUE}
make(my_plan, verbose = 0L)
outdated(my_plan)
sankey_drake_graph(my_plan)
```
When you change a dependency, some targets fall out of date (black nodes).
```{r, eval = TRUE}
reg2 <- function(d){
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
sankey_drake_graph(my_plan)
```
## Subgraphs
Graphs can grow enormous for serious projects, so there are multiple ways to focus on a manageable subgraph. The most brute-force way is to just pick a manual `subset` of nodes. However, with the `subset` argument, the graphing functions can drop intermediate nodes and edges.
```{r, eval = TRUE}
vis_drake_graph(
my_plan,
subset = c("regression2_small", "large")
)
```
The rest of the subgraph functionality preserves connectedness. Use `targets_only` to ignore the imports.
```{r, eval = TRUE}
vis_drake_graph(my_plan, targets_only = TRUE)
```
Similarly, you can just show downstream nodes.
```{r, eval = TRUE}
vis_drake_graph(my_plan, from = c("regression2_small", "regression2_large"))
```
Or upstream ones.
```{r, eval = TRUE}
vis_drake_graph(my_plan, from = "small", mode = "in")
```
In fact, let us just take a small neighborhood around a target in both directions. For the graph below, given order is 1, but all the custom `file_out()` output files of the neighborhood's targets appear as well. This ensures consistent behavior between `show_output_files = TRUE` and `show_output_files = FALSE` (more on that later).
```{r, eval = TRUE}
vis_drake_graph(my_plan, from = "small", mode = "all", order = 1)
```
## Control the `vis_drake_graph()` legend.
Some arguments to `vis_drake_graph()` control the legend.
```{r, eval = TRUE}
vis_drake_graph(my_plan, full_legend = TRUE, ncol_legend = 2)
```
To remove the legend altogether, set the `ncol_legend` argument to `0`.
```{r, eval = TRUE}
vis_drake_graph(my_plan, ncol_legend = 0)
```
## Clusters
With the `group` and `clusters` arguments to the graphing functions, you can condense nodes into clusters. This is handy for workflows with lots of targets. Take the schools scenario from the [`drake` plan guide](#plans). Our plan was generated with `drake_plan(trace = TRUE)`, so it has wildcard columns that group nodes into natural clusters already. You can manually add such columns if you wish.
```{r, eval = TRUE}
# Visit https://books.ropensci.org/drake/static.html
# to learn about the syntax with target(transform = ...).
plan <- drake_plan(
school = target(
get_school_data(id),
transform = map(id = c(1, 2, 3))
),
credits = target(
fun(school),
transform = cross(
school,
fun = c(check_credit_hours, check_students, check_graduations)
)
),
public_funds_school = target(
command = check_public_funding(school),
transform = map(school = c(school_1, school_2))
),
trace = TRUE
)
plan
```
Ordinarily, the workflow graph gives a separate node to each individual import object or target.
```{r, echo = FALSE}
check_credit_hours <- check_students <- check_graduations <-
check_public_funding <- get_school_data <- function(){}
```
```{r, eval = TRUE}
vis_drake_graph(plan)
```
For large projects with hundreds of nodes, this can get quite cumbersome. But here, we can choose a wildcard column (or any other column in the plan, even custom columns) to condense nodes into natural clusters. For the `group` argument to the graphing functions, choose the name of a column in `plan` or a column you know will be in `drake_graph_info(my_plan)$nodes`. Then for `clusters`, choose the values in your `group` column that correspond to nodes you want to bunch together. The new graph is not as cumbersome.
```{r, eval = TRUE}
vis_drake_graph(plan,
group = "school",
clusters = c("school_1", "school_2", "school_3")
)
```
As previously mentioned, you can group on any column in `drake_graph_info(my_plan)$nodes`. Let's return to the `mtcars` project for demonstration.
```{r, eval = TRUE}
vis_drake_graph(my_plan)
```
Let's condense all the imports into one node and all the up-to-date targets into another. That way, the outdated targets stand out.
```{r, eval = TRUE}
vis_drake_graph(
my_plan,
group = "status",
clusters = c("imported", "up to date")
)
```
## Output files
`drake` can reproducibly track multiple output files per target and show them in the graph.
```{r, eval = TRUE}
plan <- drake_plan(
target1 = {
file.copy(file_in("in1.txt"), file_out("out1.txt"))
file.copy(file_in("in2.txt"), file_out("out2.txt"))
},
target2 = {
file.copy(file_in("out1.txt"), file_out("out3.txt"))
file.copy(file_in("out2.txt"), file_out("out4.txt"))
}
)
writeLines("in1", "in1.txt")
writeLines("in2", "in2.txt")
make(plan)
writeLines("abcdefg", "out3.txt")
vis_drake_graph(plan, targets_only = TRUE)
```
If your graph is too busy, you can hide the output files with `show_output_files = FALSE`.
```{r, eval = TRUE}
vis_drake_graph(plan, show_output_files = FALSE, targets_only = TRUE)
```
## Node Selection
*(Supported in drake > 7.7.0 only)*
First, we define our plan, adding a custom column named "link".
```{r, eval = TRUE}
mtcars_link <-
"https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html"
plan <- drake_plan(
mtc = target(
mtcars,
link = !!mtcars_link
),
mtc2 = target(
mtc,
link = !!mtcars_link
),
mtc3 = target(
modify_mtc2(mtc2, number),
transform = map(number = !!c(1:3), .tag_in = cluster_id),
link = !!mtcars_link
),
trace = TRUE
)
```
```{r, eval = TRUE}
unique_stems <- unique(plan$cluster_id)
```
### Perform the default action on select
By supplying `vis_drake_graph(on_select = TRUE, on_select_col = "my_column")`,
treats the values in the column named `"my_column"` as hyperlinks. Click on a node in the graph to navigate to the corresponding link in your browser.
```{r, eval = TRUE}
vis_drake_graph(
plan,
clusters = unique_stems,
group = "cluster_id",
on_select_col = "link",
on_select = TRUE
)
```
### Perform no action on select
No action will be taken if any of the following are given to
`vis_drake_graph()`:
- `on_select = NULL`,
- `on_select = FALSE`,
- `on_select_col = NULL`
This is the default behaviour.
```{r, eval = TRUE}
vis_drake_graph(
my_plan,
clusters = unique_stems,
group = "cluster_id",
on_select_col = "link",
on_select = NULL
)
```
### Customize the onSelect event behaviour
What if we instead wanted the browser to display an alert when a node is
clicked?
```{r, eval = TRUE}
alert_behaviour <- function(){
js <- "
function(props) {
alert('selected node with on_select_col: \\r\\n' +
this.body.data.nodes.get(props.nodes[0]).on_select_col);
}"
}
vis_drake_graph(
my_plan,
on_select_col = "link",
on_select = alert_behaviour()
)
```
## Enhanced interactivity
For enhanced interactivity, including custom interactive target documentation, see the [`mandrake`](https://mstr3336.github.io/mandrake) R package. For a taste of the functionality, visit [this vignette page](https://mstr3336.github.io/mandrake/articles/Test_Usecase.html#graph) and click the `mtcars` node in the graph.