Runs a benchmark on arbitrary combinations of tasks (Task), learners (Learner), and resampling strategies (Resampling), possibly in parallel.

benchmark(design, store_models = FALSE)

Arguments

design

:: data.frame()
Data frame (or data.table::data.table()) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and an instantiated Resampling strategy. The helper function benchmark_grid() can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task.

store_models

:: logical(1)
Keep the fitted model after the test set has been predicted? Set to TRUE if you want to further analyse the models or want to extract information like variable importance.

Value

BenchmarkResult.

Note

The fitted models are discarded after the predictions have been scored in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE.

Parallelization

This function can be parallelized with the future package. One job is one resampling iteration, and all jobs are send to an apply function from future.apply in a single batch. To select a parallel backend, use future::plan().

Logging

The mlr3 uses the lgr package for logging. lgr supports multiple log levels which can be queried with getOption("lgr.log_levels").

To suppress output and reduce verbosity, you can lower the log from the default level "info" to "warn":

lgr::get_logger("mlr3")$set_threshold("warn")

To get additional log output for debugging, increase the log level to "debug" or "trace":

lgr::get_logger("mlr3")$set_threshold("debug")

To log to a file or a data base, see the documentation of lgr::lgr-package.

Examples

# benchmarking with benchmark_grid() tasks = lapply(c("iris", "sonar"), tsk) learners = lapply(c("classif.featureless", "classif.rpart"), lrn) resamplings = rsmp("cv", folds = 3) design = benchmark_grid(tasks, learners, resamplings) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingCV>
set.seed(123) bmr = benchmark(design) ## data of all resamplings head(as.data.table(bmr))
#> uhash task #> 1: 61d08453-14a4-45d2-82bd-5e187f7f776e <TaskClassif> #> 2: 61d08453-14a4-45d2-82bd-5e187f7f776e <TaskClassif> #> 3: 61d08453-14a4-45d2-82bd-5e187f7f776e <TaskClassif> #> 4: 0dc25838-897d-4dde-bdab-1a9d2fb7de98 <TaskClassif> #> 5: 0dc25838-897d-4dde-bdab-1a9d2fb7de98 <TaskClassif> #> 6: 0dc25838-897d-4dde-bdab-1a9d2fb7de98 <TaskClassif> #> learner resampling iteration prediction #> 1: <LearnerClassifFeatureless> <ResamplingCV> 1 <list> #> 2: <LearnerClassifFeatureless> <ResamplingCV> 2 <list> #> 3: <LearnerClassifFeatureless> <ResamplingCV> 3 <list> #> 4: <LearnerClassifRpart> <ResamplingCV> 1 <list> #> 5: <LearnerClassifRpart> <ResamplingCV> 2 <list> #> 6: <LearnerClassifRpart> <ResamplingCV> 3 <list>
## aggregated performance values aggr = bmr$aggregate() print(aggr)
#> nr resample_result task_id learner_id resampling_id iters #> 1: 1 <ResampleResult> iris classif.featureless cv 3 #> 2: 2 <ResampleResult> iris classif.rpart cv 3 #> 3: 3 <ResampleResult> sonar classif.featureless cv 3 #> 4: 4 <ResampleResult> sonar classif.rpart cv 3 #> classif.ce #> 1: 0.69333333 #> 2: 0.06666667 #> 3: 0.46632160 #> 4: 0.30317460
## Extract predictions of first resampling result rr = aggr$resample_result[[1]] as.data.table(rr$prediction())
#> row_id truth response #> 1: 1 setosa virginica #> 2: 4 setosa virginica #> 3: 7 setosa virginica #> 4: 9 setosa virginica #> 5: 10 setosa virginica #> --- #> 146: 135 virginica versicolor #> 147: 139 virginica versicolor #> 148: 140 virginica versicolor #> 149: 143 virginica versicolor #> 150: 146 virginica versicolor
# benchmarking with a custom design: # - fit classif.featureless on iris with a 3-fold CV # - fit classif.rpart on sonar using a holdout tasks = list(tsk("iris"), tsk("sonar")) learners = list(lrn("classif.featureless"), lrn("classif.rpart")) resamplings = list(rsmp("cv", folds = 3), rsmp("holdout")) design = data.table::data.table( task = tasks, learner = learners, resampling = resamplings ) ## instantiate resamplings design$resampling = Map( function(task, resampling) resampling$clone()$instantiate(task), task = design$task, resampling = design$resampling ) ## run benchmark bmr = benchmark(design) print(bmr)
#> <BenchmarkResult> of 4 rows with 2 resampling runs #> nr task_id learner_id resampling_id iters warnings errors #> 1 iris classif.featureless cv 3 0 0 #> 2 sonar classif.rpart holdout 1 0 0
## get the training set of the 2nd iteration of the featureless learner on iris rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]] rr$resampling$train_set(2)
#> [1] 2 3 7 14 17 18 20 22 32 37 40 43 44 45 47 50 53 56 #> [19] 57 63 67 69 71 72 73 76 79 81 83 84 85 88 92 95 97 113 #> [37] 114 115 120 127 130 131 137 138 140 141 142 143 149 150 6 8 11 12 #> [55] 15 16 19 23 24 28 29 33 34 38 39 42 49 51 58 59 61 62 #> [73] 64 65 66 68 74 77 78 82 87 90 93 99 101 104 105 108 111 112 #> [91] 119 121 124 128 129 133 135 144 145 146