-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resample() does not set data_prototype (and task_prototype), which some learners rely on #987
Comments
Hey, sorry I can't reproduce the issue. I create a clean environment with renv::init(bare = TRUE)
renv::install(c("mlr3@0.17.1", "mlr-org/mlr3extralearners@*release", "randomForest")) Your code runs without any problems. task = tsk("boston_housing")
task$select(c("age", "b", "chas"))
learner = lrn("regr.randomForest", importance = "mse")
learner$train(task)
rr = resample(task, learner, rsmp("cv", folds = 10)) Session info. R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] mlr3extralearners_0.7.1 mlr3_0.17.1
loaded via a namespace (and not attached):
[1] digest_0.6.33 backports_1.4.1 R6_2.5.1 codetools_0.2-19 randomForest_4.7-1.1 lgr_0.4.4 parallel_4.3.1 RhpcBLASctl_0.23-42 palmerpenguins_0.1.1
[10] mlr3misc_0.13.0 parallelly_1.36.0 pak_0.7.1 future_1.33.1 renv_1.0.3 data.table_1.14.10 compiler_4.3.1 paradox_0.11.1 globals_0.16.2
`` |
My Kaggle kernel has R 4.0 and the Ubuntu 20 installed by default. Not sure if I can change that. What do you recommend? |
I can confirm that there is a bug on kaggle. It is not the subsetting of the task and not the task itself. The error does not occur with |
I believe the issue is this line in the randomforest learner: This executes task$data(cols = intersect(names(learner$state$data_prototype),
task$feature_names)) When I stop here, the learner's However, in older R versions, Idk when this new behaviour of
.... although the timing does not seem to match. But somewhere between 4.1.2 and 4.2.0 I think. Too lazy to check. Now to the bug in our code: I assume the problem is that resampling does not set the |
(It may be unnecessary, currently, to set data_prototype in resampling, since the task remains the same, but this may change with the new holdout task thing that may be introduced. Also we should make sure other places handle |
Hi, I'm using MLR3 on a Kaggle kernel and found issues with the
resample
function. The error message mentions some issues withdata.table
column selection andfuture.apply
.I'm currently able to use
mlr3
v0.16.1 and the latest release of mlr3extralerners, but forcingdata.table
andfuture.apply
to not upgrade by default (as they are dependencies to both).Reproducible code:
Session info:
The text was updated successfully, but these errors were encountered: