Runs a benchmark on arbitrary combinations of learners, tasks, and resampling strategies (possibly in parallel). Resamplings which are not already instantiated will be instantiated automatically. However, these auto-instantiated resamplings will not be synchronized per task, i.e. different learners will work on different splits of the same task.

To generate exhaustive designs and automatically instantiate resampling strategies per task, use expand_grid().

benchmark(design, ctrl = list())

## Arguments

design :: data.frame() Data frame (or data.table()) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and a Resampling strategy. All resamplings must be properly instantiated. The helper function expand_grid() can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task. :: (named list()) Object to control learner execution. See mlr_control() for details. Note that per default, fitted learner models are discarded after the prediction in order to save some memory.

## Note

The fitted models are discarded after the predictions have been scored in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE via mlr_control().

## Parallelization

This function can be parallelized with the future package. One job is one resampling iteration, and all jobs are send to an apply function from future.apply in a single batch. To select a parallel backend, use future::plan().

## Syntactic Sugar

The mlr3 package provides some shortcuts to ease the creation of its objects.

First, instead of the objects themselves, it is possible to pass a character() vector which is used to lookup the provided keys in a mlr3misc::Dictonary:

Additionally, each task has an associated default measure (stored in mlr_reflections) which is used as a fallback if no other measure is provided. Classification tasks default to the classification error in "classif.ce", regression tasks to the mean squared error in "regr.mse".

## Examples

# benchmarking with expand_grid()
tasks = mlr_tasks$mget(c("iris", "sonar")) learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))
resamplings = mlr_resamplings$mget("cv3") design = expand_grid(tasks, learners, resamplings) print(design)#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> set.seed(123) bmr = benchmark(design) ## data of all resamplings head(as.data.table(bmr))#> task learner resampling iteration #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> 1 #> 2: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> 2 #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> 3 #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> 1 #> 5: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> 2 #> 6: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> 3 #> prediction hash #> 1: <PredictionClassif> 8fc680bb83e2391b #> 2: <PredictionClassif> 8fc680bb83e2391b #> 3: <PredictionClassif> 8fc680bb83e2391b #> 4: <PredictionClassif> a224a2b977b2d02d #> 5: <PredictionClassif> a224a2b977b2d02d #> 6: <PredictionClassif> a224a2b977b2d02d ## aggregated performance values aggr = bmr$aggregate()
print(aggr)#>                hash  resample_result task_id          learner_id resampling_id
#> 1: 8fc680bb83e2391b <ResampleResult>    iris classif.featureless           cv3
#> 2: a224a2b977b2d02d <ResampleResult>    iris       classif.rpart           cv3
#> 3: a2fb374303ad6d76 <ResampleResult>   sonar classif.featureless           cv3
#> 4: 8b9bdd93d93b46da <ResampleResult>   sonar       classif.rpart           cv3
#>    classif.ce
#> 1: 0.70000000
#> 2: 0.05333333
#> 3: 0.53857833
#> 4: 0.27384403
## Extract predictions of first resampling result
rr = aggr$resample_result[[1]] as.data.table(rr$prediction)#>      row_id     truth response
#>   1:      1    setosa   setosa
#>   2:      4    setosa   setosa
#>   3:      7    setosa   setosa
#>   4:     14    setosa   setosa
#>   5:     19    setosa   setosa
#>  ---
#> 146:    136 virginica   setosa
#> 147:    143 virginica   setosa
#> 148:    145 virginica   setosa
#> 149:    147 virginica   setosa
#> 150:    148 virginica   setosa
# benchmarking with a custom design:
# - fit classif.featureless on iris with a 3-fold CV
# - fit classif.rpart on sonar using a holdout
design = data.table::data.table(
task = mlr_tasks$mget(c("iris", "sonar")), learner = mlr_learners$mget(c("classif.featureless", "classif.rpart")),
resampling = mlr_resamplings$mget(c("cv3", "holdout")) ) ## instantiate resamplings design$resampling = Map(
function(task, resampling) resampling$clone()$instantiate(task),
task = design$task, resampling = design$resampling
)

## calculate benchmark
bmr = benchmark(design)
print(bmr)#> <BenchmarkResult> of 4 rows with 2 resampling runs
## get the training set of the 2nd iteration of the featureless learner on iris
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)#>   [1]   2   3   7  14  17  18  20  22  32  37  40  43  44  45  47  50  53  56
#>  [19]  57  63  67  69  71  72  73  76  79  81  83  84  85  88  92  95  97 113
#>  [37] 114 115 120 127 130 131 137 138 140 141 142 143 149 150   6   8  11  12
#>  [55]  15  16  19  23  24  28  29  33  34  38  39  42  49  51  58  59  61  62
#>  [73]  64  65  66  68  74  77  78  82  87  90  93  99 101 104 105 108 111 112
#>  [91] 119 121 124 128 129 133 135 144 145 146