Splits data into training and test sets in a cross-validation fashion based
on a user-provided categorical vector.
This vector can be passed during instantiation either via an arbitrary factor f
with the same length as task$nrow
, or via a single string col
referring to a
column in the task.
An alternative but equivalent approach using leave-one-out resampling is showcased in the examples of mlr_resamplings_loo.
Dictionary
This Resampling can be instantiated via the dictionary mlr_resamplings or with the associated sugar function rsmp()
:
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-resampling
Package mlr3spatiotempcv for spatio-temporal resamplings.
as.data.table(mlr_resamplings)
for a table of available Resamplings in the running session (depending on the loaded packages).mlr3spatiotempcv for additional Resamplings for spatio-temporal tasks.
Other Resampling:
Resampling
,
mlr_resamplings
,
mlr_resamplings_bootstrap
,
mlr_resamplings_custom
,
mlr_resamplings_cv
,
mlr_resamplings_holdout
,
mlr_resamplings_insample
,
mlr_resamplings_loo
,
mlr_resamplings_repeated_cv
,
mlr_resamplings_subsampling
Super class
mlr3::Resampling
-> ResamplingCustomCV
Active bindings
iters
(
integer(1)
)
Returns the number of resampling iterations, depending on the values stored in theparam_set
.
Methods
Method instantiate()
Instantiate this Resampling as cross-validation with custom splits.
Arguments
task
Task
Used to extract row ids.f
(
factor()
|character()
)
Vector of type factor or character with the same length astask$nrow
. Row ids are split on this vector, each distinct value results in a fold. Empty factor levels are dropped and row ids corresponding to missing values are removed, c.f.split()
.col
(
character(1)
)
Name of the task column to use for splitting. Alternative and mutually exclusive to providing the factor levels as a vector via parameterf
.
Examples
# Create a task with 10 observations
task = tsk("penguins")
task$filter(1:10)
# Instantiate Resampling:
custom_cv = rsmp("custom_cv")
f = factor(c(rep(letters[1:3], each = 3), NA))
custom_cv$instantiate(task, f = f)
custom_cv$iters # 3 folds
#> [1] 3
# Individual sets:
custom_cv$train_set(1)
#> [1] 4 5 6 7 8 9
custom_cv$test_set(1)
#> [1] 1 2 3
# Disjunct sets:
intersect(custom_cv$train_set(1), custom_cv$test_set(1))
#> integer(0)