This is the result container object returned by benchmark().

Note that all stored objects are accessed by reference. Do not modify any object without cloning it first.

Format

R6::R6Class object.

Construction

bmr = BenchmarkResult$new(data = data.table())

Fields

Methods

  • aggregate(measures = NULL, ids = TRUE, params = FALSE, warnings = FALSE, errors = FALSE)
    (list of Measure, logical(1), logical(1), logical(1), logical(1)) -> data.table::data.table()
    Returns a result table where resampling iterations are aggregated together into ResampleResults. A column with the aggregated performance is added for each Measure, named with the id of the respective measure.

    Additional arguments control the number of additional columns:

    • ids :: logical(1)
      Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns.

    • params :: logical(1)
      Adds the hyperparameter values as extra list column "params". You can unnest them with mlr3misc::unnest().

    • warnings :: logical(1)
      Adds the number of resampling iterations with at least one warning as extra integer column "warnings".

    • errors :: logical(1)
      Adds the number of resampling iterations with errors as extra integer column "errors".

  • performance(measures = NULL, ids = TRUE)
    (list of Measure, logical(1)) -> data.table::data.table()
    Returns a table with one row for each resampling iteration, including all involved objects: Task, Learner, Resampling, iteration number (integer(1)), and Prediction. If ids is set to TRUE, character column of extracted ids are added to the table for convenient filtering: "task_id", "learner_id", and "resampling_id". Additionally calculates the provided performance measures and binds the performance as extra columns. These columns are named using the id of the respective Measure.

  • resample_result(i)
    (integer(1) -> ResampleResult)
    Retrieve the i-th ResampleResult.

  • combine(bmr)
    (BenchmarkResult | NULL) -> self
    Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place. If bmr is NULL, simply returns self.

    In case of duplicated ResampleResults, an exception is raised. Two ResampleResults are identical iff the hashes of the respective Task, Learner and Resampling are identical. I.e., they must operate on the exactly same data, with the same learner with the same hyperparameters and the same splits into training and test sets.

S3 Methods

Examples

set.seed(123) learners = list( lrn("classif.featureless", predict_type = "prob"), lrn("classif.rpart", predict_type = "prob") ) design = benchmark_grid( tasks = list(tsk("sonar"), tsk("spam")), learners = learners, resamplings = rsmp("cv", folds = 3) ) print(design)
#> task learner resampling #> 1: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 2: <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> 3: <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 4: <TaskClassif> <LearnerClassifRpart> <ResamplingCV>
bmr = benchmark(design) print(bmr)
#> <BenchmarkResult> of 12 rows with 4 resampling runs #> nr task_id learner_id resampling_id warnings errors classif.ce #> 1 sonar classif.featureless cv 0 0 0.466 #> 2 sonar classif.rpart cv 0 0 0.274 #> 3 spam classif.featureless cv 0 0 0.394 #> 4 spam classif.rpart cv 0 0 0.109
bmr$tasks
#> task_hash task_id task #> 1: 1455849c20737725 sonar <TaskClassif> #> 2: b547a066c824a9b6 spam <TaskClassif>
bmr$learners
#> learner_hash learner_id learner #> 1: 69bf5f307bda0035 classif.featureless <LearnerClassifFeatureless> #> 2: 5878aa6f67892f3c classif.rpart <LearnerClassifRpart>
# first 5 individual resamplings head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)
#> hash task learner resampling #> 1: e483fddaed7ce8a5 <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 2: e483fddaed7ce8a5 <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 3: e483fddaed7ce8a5 <TaskClassif> <LearnerClassifFeatureless> <ResamplingCV> #> 4: 5f8efec5f7ef5c7c <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> 5: 5f8efec5f7ef5c7c <TaskClassif> <LearnerClassifRpart> <ResamplingCV> #> iteration prediction #> 1: 1 <PredictionClassif> #> 2: 2 <PredictionClassif> #> 3: 3 <PredictionClassif> #> 4: 1 <PredictionClassif> #> 5: 2 <PredictionClassif>
# aggregate results bmr$aggregate()
#> nr resample_result task_id learner_id resampling_id classif.ce #> 1: 1 <ResampleResult> sonar classif.featureless cv 0.4660455 #> 2: 2 <ResampleResult> sonar classif.rpart cv 0.2739130 #> 3: 3 <ResampleResult> spam classif.featureless cv 0.3940399 #> 4: 4 <ResampleResult> spam classif.rpart cv 0.1086721
# aggregate results with hyperparameters as separate columns mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")
#> nr resample_result task_id learner_id resampling_id classif.ce #> 1: 1 <ResampleResult> sonar classif.featureless cv 0.4660455 #> 2: 2 <ResampleResult> sonar classif.rpart cv 0.2739130 #> 3: 3 <ResampleResult> spam classif.featureless cv 0.3940399 #> 4: 4 <ResampleResult> spam classif.rpart cv 0.1086721 #> method xval #> 1: mode NA #> 2: <NA> 0 #> 3: mode NA #> 4: <NA> 0
# extract resample result for classif.rpart rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]] print(rr)
#> <ResampleResult> of 3 iterations #> * Task: sonar #> * Learner: classif.rpart #> * Performance: 0.274 [classif.ce] #> * Warnings: 0 in 0 iterations #> * Errors: 0 in 0 iterations
# access the confusion matrix of the first resampling iteration rr$data$prediction[[1]]$confusion
#> truth #> response M R #> M 30 18 #> R 3 19