This is the abstract base class for resampling objects like ResamplingCV and ResamplingBootstrap.

The objects of this class define how a task is partitioned for resampling (e.g., in resample() or benchmark()), using a set of hyperparameters such as the number of folds in cross-validation.

Resampling objects can be instantiated on a Task, which applies the strategy on the task and manifests in a fixed partition of row_ids of the Task.

Predefined resamplings are stored in the Dictionary mlr_resamplings, e.g. cv or bootstrap.

Format

R6::R6Class object.

Construction

Note: This object is typically constructed via a derived classes, e.g. ResamplingCV or ResamplingHoldout.

r = Resampling$new(id, param_set, param_vals)
  • id :: character(1)
    Identifier for the resampling strategy.

  • param_set :: paradox::ParamSet
    Set of hyperparameters.

  • param_vals :: named list()
    List of hyperparameter settings.

Fields

  • id :: character(1)
    Identifier of the learner.

  • param_set :: paradox::ParamSet
    Description of available hyperparameters and hyperparameter settings.

  • hash :: character(1)
    Hash (unique identifier) for this object.

  • instance :: any
    During instantiate(), the instance is stored in this slot. The instance can be in any arbitrary format.

  • is_instantiated :: logical(1)
    Is TRUE, if the resampling has been instantiated.

  • duplicated_ids :: logical(1)
    Is TRUE if this resampling strategy may have duplicated row ids in a single training set or test set. E.g., this is TRUE for Bootstrap, and FALSE for cross validation.

  • iters :: integer(1)
    Return the number of resampling iterations, depending on the values stored in the param_set.

  • task_hash :: character(1)
    The hash of the task which was passed to r$instantiate().

Methods

See also

Other Resampling: mlr_resamplings

Examples

r = mlr_resamplings$get("subsampling") # Default parametrization r$param_set$values
#> $repeats #> [1] 30 #> #> $ratio #> [1] 0.6666667 #>
# Do only 3 repeats on 10% of the data r$param_set$values = list(ratio = 0.1, repeats = 3) r$param_set$values
#> $ratio #> [1] 0.1 #> #> $repeats #> [1] 3 #>
# Instantiate on iris task task = mlr_tasks$get("iris") r$instantiate(task) # Extract train/test sets train_set = r$train_set(1) print(train_set)
#> [1] 147 5 127 29 83 15 9 103 93 6 129 26 43 14 94
intersect(train_set, r$test_set(1))
#> integer(0)
# Another example: 10-fold CV r = mlr_resamplings$get("cv")$instantiate(task) r$train_set(1)
#> [1] 17 18 24 32 45 72 100 102 112 116 117 120 130 138 143 10 12 16 #> [19] 35 44 47 77 86 89 98 113 125 129 131 147 1 5 7 9 11 28 #> [37] 48 49 55 81 96 106 119 128 141 3 4 31 36 41 53 54 57 65 #> [55] 80 87 99 107 115 127 39 46 50 63 66 70 78 79 90 101 109 124 #> [73] 134 148 149 13 15 20 38 51 71 73 103 104 110 133 135 137 145 146 #> [91] 21 23 29 30 33 37 42 58 59 61 68 74 122 144 150 6 8 14 #> [109] 22 43 64 67 69 75 82 88 97 105 121 139 2 26 27 52 60 62 #> [127] 83 91 94 108 118 123 126 132 136
# Stratification task = mlr_tasks$get("pima") prop.table(table(task$truth())) # moderately unbalanced
#> #> pos neg #> 0.3489583 0.6510417
r = mlr_resamplings$get("subsampling") r$instantiate(task) prop.table(table(task$truth(r$train_set(1)))) # roughly same proportion
#> #> pos neg #> 0.328125 0.671875