This is the result container object returned by benchmark(). A BenchmarkResult consists of the data row-binded data of multiple ResampleResults, which can easily be re-constructed.

Note that all stored objects are accessed by reference. Do not modify any object without cloning it first.

S3 Methods

Public fields

data

(data.table::data.table())
Internal data storage with one row per resampling iteration. Can be joined with $rr_data by joining on column "hash". We discourage users to directly work with this table.

rr_data

(data.table::data.table())
Internal data storage with one row per ResampleResult (instead of one row per resampling iteration as in $data).Package develops may opt to add additional columns here. These columns are preserved in all mutators.Can be combined with $data by (left) joining on the key column "hash". E.g., mlr3tuning stores additional information for the optimization path in this table.

Active bindings

task_type

(character(1))
Task type of objects in the BenchmarkResult. All stored objects (Task, Learner, Prediction) in a single BenchmarkResult are required to have the same task type, e.g., "classif" or "regr". This is NULL for empty BenchmarkResults.

tasks

(data.table::data.table())
Table of included Tasks with three columns:

learners

(data.table::data.table())
Table of included Learners with three columns:

Note that it is not feasible to access learned models via this getter, as the training task would be ambiguous. For this reason the returned learner are reseted before they are returned. Instead, select a row from the table returned by $score().

resamplings

(data.table::data.table())
Table of included Resamplings with three columns:

n_resample_results

(integer(1))
Returns the total number of stored ResampleResults.

uhashes

(character())
Set of (unique) hashes of all included ResampleResults.

Methods

Public methods


Method new()

Creates a new instance of this R6 class.

Usage

BenchmarkResult$new(data = data.table())

Arguments

data

(data.table::data.table())
Table with data for one resampling iteration per row, with at least the following columns:

Column "uhash" is the unique hash of the corresponding ResampleResult. Additional columns are kept in the resulting object, but otherwise ignored by BenchmarkResult.


Method help()

Opens the help page for this object.

Usage

BenchmarkResult$help()


Method format()

Helper for print outputs.

Usage

BenchmarkResult$format()


Method print()

Printer.

Usage

BenchmarkResult$print()


Method combine()

Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place. If the second BenchmarkResult bmr is NULL, simply returns self. Note that you can alternatively use the combine function c() which calls this method internally.

Usage

BenchmarkResult$combine(bmr)

Arguments

bmr

(BenchmarkResult)
A second BenchmarkResult object.

Returns

Returns the object itself, but modified by reference. You need to explicitly $clone() the object beforehand if you want to keeps the object in its previous state.


Method score()

Returns a table with one row for each resampling iteration, including all involved objects: Task, Learner, Resampling, iteration number (integer(1)), and Prediction. If ids is set to TRUE, character column of extracted ids are added to the table for convenient filtering: "task_id", "learner_id", and "resampling_id".

Additionally calculates the provided performance measures and binds the performance scores as extra columns. These columns are named using the id of the respective Measure.

Usage

BenchmarkResult$score(measures = NULL, ids = TRUE)

Arguments

measures

(Measure | list of Measure)
Measure(s) to calculate.

ids

(logical(1))
Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns for convenient subsetting.

Returns

data.table::data.table().


Method aggregate()

Returns a result table where resampling iterations are combined into ResampleResults. A column with the aggregated performance score is added for each Measure, named with the id of the respective measure.

For convenience, different flags can be set to extract more information from the returned ResampleResult:

Usage

BenchmarkResult$aggregate(
  measures = NULL,
  ids = TRUE,
  uhashes = FALSE,
  params = FALSE,
  conditions = FALSE
)

Arguments

measures

(Measure | list of Measure)
Measure(s) to calculate.

ids

(logical(1))
Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns for convenient subsetting.

uhashes

(logical(1))
Adds the uhash values of the ResampleResult as extra character column "uhash".

params

(logical(1))
Adds the hyperparameter values as extra list column "params". You can unnest them with mlr3misc::unnest().

conditions

(logical(1))
Adds the number of resampling iterations with at least one warning as extra integer column "warnings", and the number of resampling iterations with errors as extra integer column "errors".

Returns

data.table::data.table().


Method filter()

Subsets the benchmark result. If task_ids is not NULL, keeps all tasks with provided task ids while discards all others. Same procedure for learner_ids and resampling_ids.

Usage

BenchmarkResult$filter(
  task_ids = NULL,
  learner_ids = NULL,
  resampling_ids = NULL
)

Arguments

task_ids

(character())
Ids of Tasks to keep.

learner_ids

(character())
Ids of Learners to keep.

resampling_ids

(character())
Ids of Resamplings to keep.

Returns

Returns the object itself, but modified by reference. You need to explicitly $clone() the object beforehand if you want to keeps the object in its previous state.


Method resample_result()

Retrieve the i-th ResampleResult, by position or by unique hash uhash. i and uhash are mutually exclusive.

Usage

BenchmarkResult$resample_result(i = NULL, uhash = NULL)

Arguments

i

(integer(1))
The iteration value to filter for.

uhash

(logical(1))
The ushash value to filter for.

Returns

ResampleResult.


Method clone()

The objects of this class are cloneable with this method.

Usage

BenchmarkResult$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

set.seed(123) learners = list( lrn("classif.featureless", predict_type = "prob"), lrn("classif.rpart", predict_type = "prob") ) design = benchmark_grid( tasks = list(tsk("sonar"), tsk("spam")), learners = learners, resamplings = rsmp("cv", folds = 3) ) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingCV>
bmr = benchmark(design)
#> INFO [20:53:12.098] Benchmark with 12 resampling iterations #> INFO [20:53:12.129] Applying learner 'classif.featureless' on task 'sonar' (iter 1/3) #> INFO [20:53:12.165] Applying learner 'classif.featureless' on task 'sonar' (iter 2/3) #> INFO [20:53:12.178] Applying learner 'classif.featureless' on task 'sonar' (iter 3/3) #> INFO [20:53:12.191] Applying learner 'classif.rpart' on task 'sonar' (iter 1/3) #> INFO [20:53:12.257] Applying learner 'classif.rpart' on task 'sonar' (iter 2/3) #> INFO [20:53:12.292] Applying learner 'classif.rpart' on task 'sonar' (iter 3/3) #> INFO [20:53:12.319] Applying learner 'classif.featureless' on task 'spam' (iter 1/3) #> INFO [20:53:12.331] Applying learner 'classif.featureless' on task 'spam' (iter 2/3) #> INFO [20:53:12.343] Applying learner 'classif.featureless' on task 'spam' (iter 3/3) #> INFO [20:53:12.354] Applying learner 'classif.rpart' on task 'spam' (iter 1/3) #> INFO [20:53:12.425] Applying learner 'classif.rpart' on task 'spam' (iter 2/3) #> INFO [20:53:12.490] Applying learner 'classif.rpart' on task 'spam' (iter 3/3) #> INFO [20:53:12.567] Finished benchmark
print(bmr)
#> <BenchmarkResult> of 12 rows with 4 resampling runs #> nr task_id learner_id resampling_id iters warnings errors #> 1 sonar classif.featureless cv 3 0 0 #> 2 sonar classif.rpart cv 3 0 0 #> 3 spam classif.featureless cv 3 0 0 #> 4 spam classif.rpart cv 3 0 0
bmr$tasks
#> task_hash task_id task #> 1: dea3e1fd99a2120d sonar <TaskClassif> #> 2: 7cf4341432d6352e spam <TaskClassif>
bmr$learners
#> learner_hash learner_id learner #> 1: 3bbabd1058707305 classif.featureless <LearnerClassifFeatureless> #> 2: fc402e71eadd46bb classif.rpart <LearnerClassifRpart>
# first 5 individual resamplings head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)
#> uhash task #> 1: 466c69ab-b4a1-4e2b-961b-964443ff6c66 <TaskClassif> #> 2: 466c69ab-b4a1-4e2b-961b-964443ff6c66 <TaskClassif> #> 3: 466c69ab-b4a1-4e2b-961b-964443ff6c66 <TaskClassif> #> 4: df563694-59dd-4a5b-a6c2-8939626e47f1 <TaskClassif> #> 5: df563694-59dd-4a5b-a6c2-8939626e47f1 <TaskClassif> #> learner resampling iteration prediction #> 1: <LearnerClassifFeatureless> <ResamplingCV> 1 <list> #> 2: <LearnerClassifFeatureless> <ResamplingCV> 2 <list> #> 3: <LearnerClassifFeatureless> <ResamplingCV> 3 <list> #> 4: <LearnerClassifRpart> <ResamplingCV> 1 <list> #> 5: <LearnerClassifRpart> <ResamplingCV> 2 <list>
# aggregate results bmr$aggregate()
#> nr resample_result task_id learner_id resampling_id iters #> 1: 1 <ResampleResult> sonar classif.featureless cv 3 #> 2: 2 <ResampleResult> sonar classif.rpart cv 3 #> 3: 3 <ResampleResult> spam classif.featureless cv 3 #> 4: 4 <ResampleResult> spam classif.rpart cv 3 #> classif.ce #> 1: 0.4660455 #> 2: 0.2739130 #> 3: 0.3940399 #> 4: 0.1086721
# aggregate results with hyperparameters as separate columns mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")
#> nr resample_result task_id learner_id resampling_id iters #> 1: 1 <ResampleResult> sonar classif.featureless cv 3 #> 2: 2 <ResampleResult> sonar classif.rpart cv 3 #> 3: 3 <ResampleResult> spam classif.featureless cv 3 #> 4: 4 <ResampleResult> spam classif.rpart cv 3 #> classif.ce method xval #> 1: 0.4660455 mode NA #> 2: 0.2739130 <NA> 0 #> 3: 0.3940399 mode NA #> 4: 0.1086721 <NA> 0
# extract resample result for classif.rpart rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]] print(rr)
#> <ResampleResult> of 3 iterations #> * Task: sonar #> * Learner: classif.rpart #> * Warnings: 0 in 0 iterations #> * Errors: 0 in 0 iterations
# access the confusion matrix of the first resampling iteration rr$predictions()[[1]]$confusion
#> truth #> response M R #> M 30 18 #> R 3 19