Abstraction for resampling strategies. Predefined resamplings are stored in mlr_resamplings.

Format

R6Class object.

Usage

# Construction
r = Resampling$new(id, param_set, param_vals)
    # Members
r$duplicated_ids
r$hash
r$id
r$instance
r$is_instantiated
r$iters
r$param_set
r$param_vals
r$stratify
r$task_hash
    # Methods
r$instantiate(task)
r$test_set(i)
r$train_set(i)

Arguments

Details

  • $duplicated_ids is TRUE if the resampling allows observations to be included multiple times in the training set. E.g., this is true for bootstrapping, but not for cross validation.

  • $hash (character(1)) stores a checksum calculated on the id, param_vals and the instantiation. If the object is not instantiated yet, NA is returned.

  • $id (character(1)) stores the identifier of the object.

  • $instance stores the instantiated realization of the resampling. This is an arbitrary object, do not work directly with it. Instead, use $train_set() and $test_set().

  • $instantiate materializes fixed training and test splits for a given task.

  • $is_instantiated returns TRUE if the resampling has been instantiated, and FALSE otherwise.

  • $iters (integer(1)) calculates the resulting number of iterations, given the current param_vals.

  • $new() creates a new object of class Resampling.

  • $param_set (paradox::ParamSet) describes available parameters.

  • $param_vals (named list) stores the currently set parameter values. You can set parameters by assigning a named list of new parameters to this slot.

  • $stratify can be set to column names of the Task which will be used for stratification during instantiation.

  • $task_hash stores the hash of the task for which the resampling has been instantiated.

  • $test_set() returns the test set for the i-th iteration.

  • $train_set() returns the training set for the i-th iteration.

See also

Other Resampling: mlr_resamplings

Examples

r = mlr_resamplings$get("subsampling") # Default parametrization r$param_vals
#> $repeats #> [1] 30 #> #> $ratio #> [1] 0.67 #>
# Do only 3 repeats on 10% of the data r$param_vals = list(ratio = 0.1, repeats = 3) r$param_vals
#> $ratio #> [1] 0.1 #> #> $repeats #> [1] 3 #>
# Instantiate on iris task task = mlr_tasks$get("iris") r$instantiate(task) # Extract train/test sets train_set = r$train_set(1) print(train_set)
#> [1] 19 23 30 32 35 40 49 53 55 70 79 80 112 115 140
intersect(train_set, r$test_set(1))
#> integer(0)
# Another example: 10-fold CV r = mlr_resamplings$get("cv")$instantiate(task) r$train_set(1)
#> [1] 7 22 23 31 38 39 50 60 62 67 69 74 98 105 140 2 11 66 #> [19] 106 119 121 122 125 127 130 134 135 137 142 147 5 8 18 27 29 41 #> [37] 42 49 72 84 113 118 128 144 145 26 28 53 59 65 73 77 80 81 #> [55] 102 104 107 129 132 148 1 4 15 20 25 34 36 54 58 91 97 116 #> [73] 131 139 149 10 16 30 35 45 55 70 75 83 87 88 96 103 115 136 #> [91] 12 21 24 37 61 79 82 86 94 100 108 109 120 146 150 32 43 46 #> [109] 48 64 68 71 76 89 95 101 112 117 123 126 3 9 33 40 44 47 #> [127] 52 56 57 63 78 90 111 138 141
# Stratification task = mlr_tasks$get("pima") prop.table(table(task$truth())) # moderately unbalanced
#> #> neg pos #> 0.6510417 0.3489583
r = mlr_resamplings$get("subsampling") r$stratify = task$target_names # stratify on target column r$instantiate(task) prop.table(table(task$truth(r$train_set(1)))) # roughly same proportion
#> #> neg pos #> 0.6504854 0.3495146
prop.table(table(task$truth(r$train_set(1)))) # roughly same proportion # FIXME why two times?
#> #> neg pos #> 0.6504854 0.3495146