Runs a benchmark of the cross-product of learners, tasks, and resampling strategies (possibly in parallel).

Resamplings which are not already instantiated will be instantiated automatically. Note that these auto-instantiated resamplings will not be synchronized per task, i.e. learners will see different splits of the same task.

To generate exhaustive designs and automatically instantiate resampling strategies per task, see expand_grid().

benchmark(design, ctrl = list())

Arguments

design

(data.frame()): Data frame (or data.table()) with three columns: "task", "learner", and "resampling". Each row defines a set of resampled experiments by providing a Task, Learner and Resampling strategy. The helper function expand_grid() can assist in generating an exhaustive design (see examples).

ctrl

(named list as returned by mlr_control()): Object to control experiment execution. See mlr_control().

Value

BenchmarkResult.

Examples

tasks = mlr_tasks$mget(c("iris", "sonar")) learners = mlr_learners$mget(c("classif.featureless", "classif.rpart")) resamplings = mlr_resamplings$mget("holdout") design = expand_grid(tasks, learners, resamplings) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout>
set.seed(123) bmr = benchmark(design) # performance for all conducted experiments head(as.data.table(bmr))
#> hash task task_id learner #> 1: e4be888f65b194c4 <TaskClassif> iris <LearnerClassifFeatureless> #> 2: bc09a032e797e992 <TaskClassif> iris <LearnerClassifRpart> #> 3: f3ab4827f36b9492 <TaskClassif> sonar <LearnerClassifFeatureless> #> 4: 8c127813347dbd34 <TaskClassif> sonar <LearnerClassifRpart> #> learner_id resampling resampling_id classif.mmce #> 1: classif.featureless <ResamplingHoldout> holdout 0.7400000 #> 2: classif.rpart <ResamplingHoldout> holdout 0.0600000 #> 3: classif.featureless <ResamplingHoldout> holdout 0.4347826 #> 4: classif.rpart <ResamplingHoldout> holdout 0.2463768
# aggregated performance values bmr$aggregated(objects = FALSE)
#> hash resampling_id task_id learner_id classif.mmce #> 1: e4be888f65b194c4 holdout iris classif.featureless 0.7400000 #> 2: bc09a032e797e992 holdout iris classif.rpart 0.0600000 #> 3: f3ab4827f36b9492 holdout sonar classif.featureless 0.4347826 #> 4: 8c127813347dbd34 holdout sonar classif.rpart 0.2463768
# Overview of of resamplings that were conducted internally rrs = bmr$resample_results print(rrs)
#> hash task_id learner_id resampling_id N #> 1: e4be888f65b194c4 iris classif.featureless holdout 1 #> 2: bc09a032e797e992 iris classif.rpart holdout 1 #> 3: f3ab4827f36b9492 sonar classif.featureless holdout 1 #> 4: 8c127813347dbd34 sonar classif.rpart holdout 1
# Extract first ResampleResult rr = bmr$resample_result(hash = rrs$hash[1]) print(rr)
#> <ResampleResult> of learner 'iris' on task 'classif.featureless' with 1 iterations #> Measure Min. 1st Qu. Median Mean 3rd Qu. Max. Sd #> classif.mmce 0.74 0.74 0.74 0.74 0.74 0.74 NA
# Extract predictions of first experiment of this resampling head(as.data.table(rr$experiment(1)$prediction))
#> row_id response truth #> 1: 2 versicolor setosa #> 2: 3 versicolor setosa #> 3: 12 versicolor setosa #> 4: 15 versicolor setosa #> 5: 18 versicolor setosa #> 6: 19 versicolor setosa