Runs a benchmark of the cross-product of learners, tasks, and resampling strategies (possibly in parallel).

Resamplings which are not already instantiated will be instantiated automatically. Note that these auto-instantiated resamplings will not be synchronized per task, i.e. learners will see different splits of the same task.

To generate exhaustive designs and automatically instantiate resampling strategies per task, see expand_grid().

benchmark(design, ctrl = list())

Arguments

design

(data.frame()): Data frame (or data.table()) with three columns: "task", "learner", and "resampling". Each row defines a set of resampled experiments by providing a Task, Learner and Resampling strategy. The helper function expand_grid() can assist in generating an exhaustive design (see examples).

ctrl

(named list as returned by mlr_control()): Object to control experiment execution. See mlr_control().

Value

BenchmarkResult.

Examples

tasks = mlr_tasks$mget(c("iris", "sonar")) learners = mlr_learners$mget(c("classif.featureless", "classif.rpart")) resamplings = mlr_resamplings$mget("holdout") design = expand_grid(tasks, learners, resamplings) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout>
bmr = benchmark(design) # performance for all conducted experiments head(as.data.table(bmr))
#> hash task task_id learner #> 1: a9100fbdd3698662 <TaskClassif> iris <LearnerClassifFeatureless> #> 2: 3fe90c093a9fdeab <TaskClassif> iris <LearnerClassifRpart> #> 3: cd3af67cd2425cb1 <TaskClassif> sonar <LearnerClassifFeatureless> #> 4: a3b3768bd595fd77 <TaskClassif> sonar <LearnerClassifRpart> #> learner_id resampling resampling_id mmce #> 1: featureless <ResamplingHoldout> holdout 0.7200000 #> 2: rpart <ResamplingHoldout> holdout 0.0800000 #> 3: featureless <ResamplingHoldout> holdout 0.5072464 #> 4: rpart <ResamplingHoldout> holdout 0.3043478
# aggregated performance values bmr$aggregated
#> hash resample_result task_id learner_id resampling_id #> 1: a9100fbdd3698662 <ResampleResult> iris featureless holdout #> 2: 3fe90c093a9fdeab <ResampleResult> iris rpart holdout #> 3: cd3af67cd2425cb1 <ResampleResult> sonar featureless holdout #> 4: a3b3768bd595fd77 <ResampleResult> sonar rpart holdout #> mmce #> 1: 0.7200000 #> 2: 0.0800000 #> 3: 0.5072464 #> 4: 0.3043478
# Overview of of resamplings that were conducted internally rrs = bmr$resample_results print(rrs)
#> hash task_id learner_id resampling_id N #> 1: a9100fbdd3698662 iris featureless holdout 1 #> 2: 3fe90c093a9fdeab iris rpart holdout 1 #> 3: cd3af67cd2425cb1 sonar featureless holdout 1 #> 4: a3b3768bd595fd77 sonar rpart holdout 1
# Extract first ResampleResult rr = bmr$resample_result(hash = rrs$hash[1]) print(rr)
#> <ResampleResult> of learner 'iris' on task 'featureless' with 1 iterations #> Measure Min. 1st Qu. Median Mean 3rd Qu. Max. Sd #> mmce 0.72 0.72 0.72 0.72 0.72 0.72 NA
# Extract predictions of first experiment of this resampling head(as.data.table(rr$experiment(1)$prediction))
#> row_id response truth #> 1: 2 setosa setosa #> 2: 4 setosa setosa #> 3: 10 setosa setosa #> 4: 13 setosa setosa #> 5: 19 setosa setosa #> 6: 21 setosa setosa