This introduction is about resampling.

Objects

Again, we consider the iris task and a simple classification tree here.

Additionally, we need to define how we want to resample. mlr3 comes with the following resampling strategies implemented:

The experiment conducted in the introduction on train/predict/score is equivalent to a simple “holdout”, so let’s consider this one first.

To change the ratio to \(0.8\), we simply overwrite the slot:

Resampling

Now, we can pass all created objects to the resample() function to get an object of class ResampleResult:

Before we go into more detail, lets change the resampling to a 3-fold cross-validation to better illustrate what operations are possible with a resampling result.

We can do different things with resampling results, e.g.:

  • Extract the performance for the individual resampling iterations:
  • Extract and inspect the now instantiated resampling:
  • Retrieve the experiment of a specific iteration and inspect it:

Manual Instantiation

If you want to compare multiple learners, you should use the same resampling per task to reduce the variance of the performance estimation. Until now, we have just passed a resampling strategy to resample(), without specifying the actual splits into training and test. Here, we manually instantiate the resampling:

If we now pass this instantiated object to resample, the pre-calculated training and test splits will be used for both learners:

We can also combine the created result objects into a BenchmarkResult (see below for an introduction to simple benchmarking):

Benchmarking

As comparing the performance of multiple learners on multiple tasks is a frequent task, mlr3 offers the benchmark() function for convenience. Internally, the resampling strategies are automatically instantiated for you, so that each learner sees the same split of the data.

# get some example tasks
tasks = mlr_tasks$mget(c("pima", "sonar", "spam"))

# get a featureless learner and a classification tree
learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))

# let the learners predict probabilities instead of class labels (required for AUC measure)
learners$classif.featureless$predict_type = "prob"
learners$classif.rpart$predict_type = "prob"

# compare via 10-fold cross validation
resamplings = mlr_resamplings$mget("cv")

# measure accuracy (acc) and area under the curve (AUC)
measures = mlr_measures$mget(c("acc", "auc"))

# create a BenchmarkResult object
bmr = benchmark(tasks, learners, resamplings, measures)
#> INFO [mlr3] Benchmarking 60 experiments
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'pima_indians (iteration 10/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'pima_indians (iteration 10/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'sonar (iteration 10/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'sonar (iteration 10/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 1/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 2/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 3/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 4/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 5/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 6/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 7/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 8/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 9/10)' ...
#> INFO [mlr3] Running learner 'classif.featureless' on task 'spam (iteration 10/10)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'spam (iteration 10/10)' ...
#> INFO [mlr3] Finished benchmark

Usually you want to look at the results aggregated over resampling iterations:

We can aggregate it further, i.e. if we are interested which learner performed best over all tasks:

Unsurprisingly, the classification tree outperformed the featureless learner.

As a BenchmarkResult is basically a collection of multiple ResampleResult, we can extract specific ResampleResult using the stored hashes:

We can now investigate this resampling and even single experiments using the previously introduced API: