This is the abstract base class for resampling objects like ResamplingCV and ResamplingBootstrap.

The objects of this class define how a task is partitioned for resampling (e.g., in resample() or benchmark()), using a set of hyperparameters such as the number of folds in cross-validation.

Resampling objects can be instantiated on a Task, which applies the strategy on the task and manifests in a fixed partition of row_ids of the Task.

Predefined resamplings are stored in the dictionary mlr_resamplings, e.g. cv or bootstrap.

## Stratification

All derived classes support stratified sampling. The stratification variables are assumed to be discrete and must be stored in the Task with column role "stratum". In case of multiple stratification variables, each combination of the values of the stratification variables forms a strata.

First, the observations are divided into subpopulations based one or multiple stratification variables (assumed to be discrete), c.f. task$strata. Second, the sampling is performed in each of the k subpopulations separately. Each subgroup is divided into iter training sets and iter test sets by the derived Resampling. These sets are merged based on their iteration number: all training sets from all subpopulations with iteration 1 are combined, then all training sets with iteration 2, and so on. Same is done for all test sets. The merged sets can be accessed via $train_set(i) and $test_set(i), respectively.

## Grouping / Blocking

All derived classes support grouping of observations. The grouping variable is assumed to be discrete and must be stored in the Task with column role "group".

Observations in the same group are treated like a "block" of observations which must be kept together. These observations either all go together into the training set or together into the test set.

The sampling is performed by the derived Resampling on the grouping variable. Next, the grouping information is replaced with the respective row ids to generate training and test sets. The sets can be accessed via $train_set(i) and $test_set(i), respectively.

## See also

Dictionary of Resamplings: mlr_resamplings

as.data.table(mlr_resamplings) for a complete table of all (also dynamically created) Resampling implementations.

## Public fields

id

(character(1))
Identifier of the object. Used in tables, plot and text output.

param_set

(paradox::ParamSet)
Set of hyperparameters.

instance

(any)
During instantiate(), the instance is stored in this slot in an arbitrary format. Note that if a grouping variable is present in the Task, a Resampling may operate on the group ids internally instead of the row ids (which may lead to confusion).It is advised to not work directly with the instance, but instead only use the getters $train_set() and $test_set().

task_hash

(character(1))
The hash of the Task which was passed to r$instantiate().

task_nrow

(integer(1))

### Method format()

Helper for print outputs.

#### Arguments

(ignored).

### Method instantiate()

#### Arguments

task

#### Arguments

i

(integer(1))
Iteration.

#### Returns

(integer()) of row ids.

### Method test_set()

Returns the row ids of the i-th test set.

#### Arguments

deep

Whether to make a deep clone.

## Examples

r = rsmp("subsampling")

# Default parametrization
r$param_set$values
#> $repeats #> [1] 30 #> #>$ratio
#> [1] 0.6666667
# Do only 3 repeats on 10% of the data
r$param_set$values = list(ratio = 0.1, repeats = 3)
r$param_set$values
#> $ratio #> [1] 0.1 #> #>$repeats
#> [1] 3
r$instantiate(task) # Extract train/test sets train_set = r$train_set(1)
#>  [1]  88 150 122  40 133  89 118 112 119  13  94  83 103  77  14intersect(train_set, r$test_set(1)) #> integer(0) # Another example: 10-fold CV r = rsmp("cv")$instantiate(task)
r$train_set(1) #> [1] 11 24 40 41 61 62 73 75 76 82 91 109 119 131 142 18 19 64 #> [19] 71 79 84 95 105 116 124 126 127 130 146 148 6 9 10 12 33 38 #> [37] 55 59 87 89 115 123 134 136 144 23 47 52 56 66 80 86 96 100 #> [55] 106 112 122 129 133 150 3 8 17 34 44 45 63 65 70 102 110 135 #> [73] 138 139 145 4 15 25 26 31 50 57 58 85 94 101 111 114 125 128 #> [91] 28 37 42 51 72 77 81 92 104 107 117 118 120 137 143 1 14 16 #> [109] 30 36 39 43 53 54 67 69 90 97 141 147 7 13 21 27 35 46 #> [127] 49 74 83 108 113 121 132 140 149 # Stratification task = tsk("pima") prop.table(table(task$truth())) # moderately unbalanced
#> 0.3489583 0.6510417 task$col_roles$stratum = task$target_names r = rsmp("subsampling") r$instantiate(task)
prop.table(table(task$truth(r$train_set(1)))) # roughly same proportion
#> 0.3496094 0.6503906