You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr the double loop doesn't increase memory usage while with furrr and future.apply the memory usage explodes.
library(bench)
library(furrr)
library(future.apply)
library(purrr)
# purrr
single_loop <- function(x, n) {
map(1:n, ~ x[, .x])
}
# future.apply
single_loop_a <- function(x, n) {
future_lapply(1:n, FUN = function(i) x[, i])
}
# furrr
single_loop_f <- function(x, n) {
future_map(1:n, ~ x[, .x])
}
# purrr
inner_loop <- function(i, n, x = x) {
map_dbl(1:n, ~ x[.x, i])
}
outer_loop <- function(x, n) {
map(1:n, ~ inner_loop(.x, n, x = x))
}
# future.apply
inner_loop_a <- function(i, n, x = x) {
future_sapply(1:n, FUN = function(j) x[j, i])
}
outer_loop_a <- function(x, n) {
future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))
}
# furrr
inner_loop_f <- function(i, n, x = x) {
future_map_dbl(1:n, ~ x[.x, i])
}
outer_loop_f <- function(x, n) {
future_map(1:n, ~ inner_loop_f(.x, n, x = x))
}
n <- 100
x <- matrix(rnorm(n * n), nrow = n)
identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE
plan(sequential)
# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB
# With a double loop memory usage remains similar for purrr, but explodes
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB
# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)
bench::mark(single_loop(x, n))$mem_alloc
287MB
bench::mark(single_loop_a(x, n))$mem_alloc
287MB
bench::mark(single_loop_f(x, n))$mem_alloc
287MB
bench::mark(outer_loop(x, n))$mem_alloc
191MB
bench::mark(outer_loop_a(x, n))$mem_alloc
2.88GB
bench::mark(outer_loop_f(x, n))$mem_alloc
1.57GB
As you can see, using the double loop actually decreases memory usage for purrr, although it stays very similar, but causes memory usage to explode for furrr and future.apply. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr and future.apply because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!
The text was updated successfully, but these errors were encountered:
A little more information. I don't know much about memory profiling, so apologies if this is not the best way to present the information, but in the hopes it might be helpful...
I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using
purrr
the double loop doesn't increase memory usage while withfurrr
andfuture.apply
the memory usage explodes.As you can see, using the double loop actually decreases memory usage for
purrr
, although it stays very similar, but causes memory usage to explode forfurrr
andfuture.apply
. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example usingfurrr
andfuture.apply
because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue withfurrr
. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!The text was updated successfully, but these errors were encountered: