Runs a benchmark on arbitrary combinations of tasks (Task), learners (Learner), and resampling strategies (Resampling), possibly in parallel.

benchmark(design, store_models = FALSE)

## Arguments

design :: data.frame() Data frame (or data.table::data.table()) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and an instantiated Resampling strategy. The helper function benchmark_grid() can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task. :: logical(1) Keep the fitted model after the test set has been predicted? Set to TRUE if you want to further analyse the models or want to extract information like variable importance.

## Note

The fitted models are discarded after the predictions have been scored in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE.

## Parallelization

This function can be parallelized with the future package. One job is one resampling iteration, and all jobs are send to an apply function from future.apply in a single batch. To select a parallel backend, use future::plan().

## Logging

The mlr3 uses the lgr package for logging. lgr supports multiple log levels which can be queried with getOption("lgr.log_levels").

To suppress output and reduce verbosity, you can lower the log from the default level "info" to "warn":

lgr::get_logger("mlr3")$set_threshold("warn")  To get additional log output for debugging, increase the log level to "debug" or "trace": lgr::get_logger("mlr3")$set_threshold("debug")


To log to a file or a data base, see the documentation of lgr::lgr-package.

## Examples

# benchmarking with benchmark_grid()
tasks = lapply(c("iris", "sonar"), tsk)
learners = lapply(c("classif.featureless", "classif.rpart"), lrn)
resamplings = rsmp("cv", folds = 3)

design = benchmark_grid(tasks, learners, resamplings)
print(design)#>             task                     learner     resampling
#> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV>
#> 2: <TaskClassif>       <LearnerClassifRpart> <ResamplingCV>
#> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV>
#> 4: <TaskClassif>       <LearnerClassifRpart> <ResamplingCV>
set.seed(123)
bmr = benchmark(design)

## data of all resamplings
#> 1: 73b46536-ab17-4451-bb0d-8563f8f24951 <TaskClassif>
#> 2: 73b46536-ab17-4451-bb0d-8563f8f24951 <TaskClassif>
#> 3: 73b46536-ab17-4451-bb0d-8563f8f24951 <TaskClassif>
#> 4: dca25f7b-d92b-4ef0-a0bd-c8acdc15f8ce <TaskClassif>
#> 5: dca25f7b-d92b-4ef0-a0bd-c8acdc15f8ce <TaskClassif>
#> 6: dca25f7b-d92b-4ef0-a0bd-c8acdc15f8ce <TaskClassif>
#>                        learner     resampling iteration prediction
#> 1: <LearnerClassifFeatureless> <ResamplingCV>         1     <list>
#> 2: <LearnerClassifFeatureless> <ResamplingCV>         2     <list>
#> 3: <LearnerClassifFeatureless> <ResamplingCV>         3     <list>
#> 4:       <LearnerClassifRpart> <ResamplingCV>         1     <list>
#> 5:       <LearnerClassifRpart> <ResamplingCV>         2     <list>
#> 6:       <LearnerClassifRpart> <ResamplingCV>         3     <list>
## aggregated performance values
aggr = bmr$aggregate() print(aggr)#> nr resample_result task_id learner_id resampling_id iters #> 1: 1 <ResampleResult> iris classif.featureless cv 3 #> 2: 2 <ResampleResult> iris classif.rpart cv 3 #> 3: 3 <ResampleResult> sonar classif.featureless cv 3 #> 4: 4 <ResampleResult> sonar classif.rpart cv 3 #> classif.ce #> 1: 0.69333333 #> 2: 0.06666667 #> 3: 0.46632160 #> 4: 0.30317460 ## Extract predictions of first resampling result rr = aggr$resample_result[[1]]
as.data.table(rr$prediction())#> row_id truth response #> 1: 1 setosa virginica #> 2: 4 setosa virginica #> 3: 7 setosa virginica #> 4: 9 setosa virginica #> 5: 10 setosa virginica #> --- #> 146: 135 virginica versicolor #> 147: 139 virginica versicolor #> 148: 140 virginica versicolor #> 149: 143 virginica versicolor #> 150: 146 virginica versicolor # benchmarking with a custom design: # - fit classif.featureless on iris with a 3-fold CV # - fit classif.rpart on sonar using a holdout tasks = list(tsk("iris"), tsk("sonar")) learners = list(lrn("classif.featureless"), lrn("classif.rpart")) resamplings = list(rsmp("cv", folds = 3), rsmp("holdout")) design = data.table::data.table( task = tasks, learner = learners, resampling = resamplings ) ## instantiate resamplings design$resampling = Map(
function(task, resampling) resampling$clone()$instantiate(task),
task = design$task, resampling = design$resampling
)

## run benchmark
bmr = benchmark(design)
print(bmr)#> <BenchmarkResult> of 4 rows with 2 resampling runs
#>  nr task_id          learner_id resampling_id iters warnings errors
#>   1    iris classif.featureless            cv     3        0      0
#>   2   sonar       classif.rpart       holdout     1        0      0
## get the training set of the 2nd iteration of the featureless learner on iris
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)#>   [1]   2   3   7  14  17  18  20  22  32  37  40  43  44  45  47  50  53  56
#>  [19]  57  63  67  69  71  72  73  76  79  81  83  84  85  88  92  95  97 113
#>  [37] 114 115 120 127 130 131 137 138 140 141 142 143 149 150   6   8  11  12
#>  [55]  15  16  19  23  24  28  29  33  34  38  39  42  49  51  58  59  61  62
#>  [73]  64  65  66  68  74  77  78  82  87  90  93  99 101 104 105 108 111 112
#>  [91] 119 121 124 128 129 133 135 144 145 146