Runs a benchmark of the cross-product of learners, tasks, and resampling strategies (possibly in parallel).

Resamplings which are not already instantiated will be instantiated automatically. Note that these auto-instantiated resamplings will not be synchronized per task, i.e. learners will see different splits of the same task.

To generate exhaustive designs and automatically instantiate resampling strategies per task, see expand_grid().

benchmark(design, measures = NULL, ctrl = list())

Arguments

design

(data.frame()): Data frame (or data.table()) with three columns: "task", "learner", and "resampling". Each row defines a set of resampled experiments by providing a Task, Learner and Resampling strategy. The helper function expand_grid() can assist in generating an exhaustive design (see examples).

measures

(list of Measure): List of performance measures to calculate. Defaults to the measures specified in the each respective Task.

ctrl

(named list as returned by mlr_control()): Object to control experiment execution. See mlr_control().

Value

BenchmarkResult.

Examples

tasks = mlr_tasks$mget(c("iris", "sonar")) learners = mlr_learners$mget(c("classif.featureless", "classif.rpart")) resamplings = mlr_resamplings$mget("holdout") design = expand_grid(tasks, learners, resamplings) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingHoldout> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingHoldout>
set.seed(123) bmr = benchmark(design) # performance for all conducted experiments head(as.data.table(bmr))
#> hash task task_id learner #> 1: 151f9c287be00f5c <TaskClassif> iris <LearnerClassifFeatureless> #> 2: f2edb6ecd44946a4 <TaskClassif> iris <LearnerClassifRpart> #> 3: 6388fa3e5a5415b9 <TaskClassif> sonar <LearnerClassifFeatureless> #> 4: db93fbcdfc26a584 <TaskClassif> sonar <LearnerClassifRpart> #> learner_id resampling resampling_id classif.ce #> 1: classif.featureless <ResamplingHoldout> holdout 0.7000000 #> 2: classif.rpart <ResamplingHoldout> holdout 0.0600000 #> 3: classif.featureless <ResamplingHoldout> holdout 0.4782609 #> 4: classif.rpart <ResamplingHoldout> holdout 0.3333333
# aggregated performance values bmr$aggregated(objects = FALSE)
#> hash resampling_id task_id learner_id classif.ce #> 1: 151f9c287be00f5c holdout iris classif.featureless 0.7000000 #> 2: f2edb6ecd44946a4 holdout iris classif.rpart 0.0600000 #> 3: 6388fa3e5a5415b9 holdout sonar classif.featureless 0.4782609 #> 4: db93fbcdfc26a584 holdout sonar classif.rpart 0.3333333
# Overview of of resamplings that were conducted internally rrs = bmr$resample_results print(rrs)
#> hash task_id learner_id resampling_id N #> 1: 151f9c287be00f5c iris classif.featureless holdout 1 #> 2: f2edb6ecd44946a4 iris classif.rpart holdout 1 #> 3: 6388fa3e5a5415b9 sonar classif.featureless holdout 1 #> 4: db93fbcdfc26a584 sonar classif.rpart holdout 1
# Extract first ResampleResult rr = bmr$resample_result(hash = rrs$hash[1]) print(rr)
#> <ResampleResult> of learner 'iris' on task 'classif.featureless' with 1 iterations #> Measure Min. 1st Qu. Median Mean 3rd Qu. Max. Sd #> classif.ce 0.7 0.7 0.7 0.7 0.7 0.7 NA
# Extract predictions of first experiment of this resampling head(as.data.table(rr$experiment(1)$prediction))
#> row_id truth response #> 1: 4 setosa virginica #> 2: 6 setosa virginica #> 3: 7 setosa virginica #> 4: 9 setosa virginica #> 5: 10 setosa virginica #> 6: 11 setosa virginica