Splits data into training and test sets in a cross-validation fashion based
on a user-provided categorical vector.
This vector can be passed during instantiation either via an arbitrary factor f
with the same length as task$nrow, or via a single string col referring to a
column in the task.
An alternative but equivalent approach using leave-one-out resampling is showcased in the examples of mlr_resamplings_loo.
Dictionary
This Resampling can be instantiated via the dictionary mlr_resamplings or with the associated sugar function rsmp():
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-resampling
Package mlr3spatiotempcv for spatio-temporal resamplings.
as.data.table(mlr_resamplings)for a table of available Resamplings in the running session (depending on the loaded packages).mlr3spatiotempcv for additional Resamplings for spatio-temporal tasks.
Other Resampling:
Resampling,
mlr_resamplings,
mlr_resamplings_bootstrap,
mlr_resamplings_custom,
mlr_resamplings_cv,
mlr_resamplings_holdout,
mlr_resamplings_insample,
mlr_resamplings_loo,
mlr_resamplings_repeated_cv,
mlr_resamplings_subsampling
Super class
mlr3::Resampling -> ResamplingCustomCV
Active bindings
iters(
integer(1))
Returns the number of resampling iterations, depending on the values stored in theparam_set.
Methods
Method instantiate()
Instantiate this Resampling as cross-validation with custom splits.
Arguments
taskTask
Used to extract row ids.f(
factor()|character())
Vector of type factor or character with the same length astask$nrow. Row ids are split on this vector, each distinct value results in a fold. Empty factor levels are dropped and row ids corresponding to missing values are removed, c.f.split().col(
character(1))
Name of the task column to use for splitting. Alternative and mutually exclusive to providing the factor levels as a vector via parameterf.
Examples
# Create a task with 10 observations
task = tsk("penguins")
task$filter(1:10)
# Instantiate Resampling:
custom_cv = rsmp("custom_cv")
f = factor(c(rep(letters[1:3], each = 3), NA))
custom_cv$instantiate(task, f = f)
custom_cv$iters # 3 folds
#> [1] 3
# Individual sets:
custom_cv$train_set(1)
#> [1] 4 5 6 7 8 9
custom_cv$test_set(1)
#> [1] 1 2 3
# Disjunct sets:
intersect(custom_cv$train_set(1), custom_cv$test_set(1))
#> integer(0)