Replies: 3 comments 4 replies
-
A nested map can provide access to library(targets)
library(tarchetypes)
library(dplyr)
library(tidyr)
library(tibble)
values <- tribble(
~study, ~batch,
"study1", "batch1",
"study1", "batch2",
"study1", "batch3",
"study1", "batch4",
"study1", "batch5",
"study1", "batch6",
"study1", "batch7",
"study2", "batch1",
"study2", "batch2",
"study3", "batch1"
)
tar_map(
values,
tar_target(
name = x,
command = paste(study, batch)
),
tar_map(
values = list(mode = c("full", "downsampled")),
tar_target(
y,
list(
x = x,
study = study,
batch = batch
)
)
)
)
You can handle this logic by setting to |
Beta Was this translation helpful? Give feedback.
-
Dear Will, thanks once more. I got quite far combining the full and downsampled modes in a nested tar_map() as discussed before, but the problem with that was my inability to specify resources separately for both modes. In my use-case, I used a resources.yaml file, specifying resources for each of the steps (separetely for each mode, as full mode requires more resources). Although 'values' of tar_map can be accessed within the command-option of a tar_target, they seem not available to the resources-option. E.g. assume a resources.yaml resulting in the following object:
I defined a function to set resources on our HPC system:
Now I combined the tar_map's as suggested before (preprocessing = a; shared between modes and b; separate per mode). I try to specify resources for each of the modes separately.
This gives the error Sorry I keep on bugging you with this, would be really helpful to me to sort out. Also hope it makes sense there are several options to solve (as said, sequential maps, or this nested approach, especially if I am able to define per-mode resources). |
Beta Was this translation helpful? Give feedback.
-
The bit about resources helps, thanks for explaining. If there are few targets in each of full and downsampled mode, you could use a single tar_map(
values,
tar_target(
name = a,
command = list(study, batch)
),
tar_target(
name = b_downsampled,
command = list("downsampled", a),
resources = resources[["downsampled"]]
),
tar_target(
name = b_full,
command = list("full", a),
resources = resources[["full"]]
)
) This could get inconvenient of there are many targets specific to the full and downsampled modes, so one workaround could be an inner library(targets)
library(tarchetypes)
library(dplyr)
library(tibble)
values <- tribble(
~study, ~batch,
"study1", "batch1",
"study1", "batch2",
"study1", "batch3",
"study2", "batch1",
"study2", "batch2",
"study3", "batch1"
)
resources <- list(
downsampled = tar_resources(
clustermq = tar_resources_clustermq(
template = list(mode = "downsampled")
)
),
full = tar_resources(
clustermq = tar_resources_clustermq(
template = list(mode = "full")
)
)
)
tar_map(
values,
tar_target(
name = a,
command = list(study, batch)
),
tar_eval(
expr = tar_target(
name = name,
command = list(mode, a),
resources = resources[[mode]]
),
values = tibble(mode = c("downsampled", "full")) %>%
mutate(name = rlang::syms(mode))
)
) |
Beta Was this translation helpful? Give feedback.
-
Help
Description
I have pipeline I would like to run over three studies ('study1', 'study2'and 'study3'), with different numbers of batches (Batch1-7 for study 1, Batch1-2 for study2 and Batch 1 for study3).
I am able to create a small tibble, including study and batch as columns to tar_map over. So far so good.
After a couple of preprocessing steps, I would like to split the pipeline further, additionally looping over a third variable (mode, which can be 'full' or 'downsampled'). I do not want to repeat the steps in 'mapped' (which are shared between modes).
A nested map (#132) does not seem to solve this issue, as I need to be able to access the values in 'values' ('study' and 'batch'), in addition to 'new' values in 'mode'.
Would a sequential tar_map work?
In addition, steps are largely shared between studies, but some steps are different. Is it possible to include if() statements within the targets-list based on the values I map over (e.g. if(study == "study1") { tar_target(...) }. This currently results in the error 'object study not found'.
I guess an alternative strategy would be to run the first steps ('mapped' in the example above) and define a pipeline for each study separately.
Any advice would be very welcome.
Beta Was this translation helpful? Give feedback.
All reactions