Measure Class

This is the abstract base class for measures like MeasureClassif and MeasureRegr.

Measures are classes tailored around two functions doing the work:

A function $score() which quantifies the performance by comparing the truth and predictions.
A function $aggregator() which combines multiple performance scores returned by $score() to a single numeric value.

In addition to these two functions, meta-information about the performance measure is stored.

Predefined measures are stored in the dictionary mlr_measures, e.g. classif.auc or time_train. Many of the measures in mlr3 are implemented in mlr3measures as ordinary functions.

A guide on how to extend mlr3 with custom measures can be found in the mlr3book.

Inheriting

For some measures (such as confidence intervals from mlr3inferr) it is necessary that a measure returns more than one value. In such cases it is necessary to overwrite the public methods $aggregate() and/or $score() to return a named numeric() where at least one of its names corresponds to the id of the measure itself.

Weights

Many measures support observation weights, indicated by their property "weights". The weights are stored in the Task where the column role weights_measure needs to be assigned to a single numeric column. The weights are automatically used if found in the task, this can be disabled by setting the field use_weights to "ignore". See the description of use_weights for more information.

If the measure is set-up to use weights but the task does not have a designated weights_measure column, an unweighted version is calculated instead. The weights do not necessarily need to sum up to 1, they are normalized by the measure if necessary.

Most measures are so-called decomposable loss functions where a point-wise loss is computed and then either mean-aggregated or summed over the test set. For measures that do mean-aggregation, weights are typically used to compute the weighted mean, which normalizes weights to sum to 1. Measures that use sum-aggregation do not normalize weights and instead multiply individual losses with the given weights. See the documentation of specific measures for more details.

Public fields

id

(character(1))
Identifier of the object. Used in tables, plot and text output.

label

(character(1))
Label for this object. Can be used in tables, plot and text output instead of the ID.

task_type

(character(1))
Task type, e.g. "classif" or "regr".

For a complete list of possible task types (depending on the loaded packages), see mlr_reflections$task_types$type.

param_set

(paradox::ParamSet)
Set of hyperparameters.

obs_loss

(function() | NULL) Function to calculate the observation-wise loss.

trafo

(list() | NULL) NULL or a list with two elements:

trafo: the transformation function applied after aggregating observation-wise losses (e.g. sqrt for RMSE)
deriv: The derivative of the trafo.

predict_type

(character(1))
Required predict type of the Learner.

check_prerequisites

(character(1))
How to proceed if one of the following prerequisites is not met:

wrong predict type (e.g., probabilities required, but only labels available).
wrong predict set (e.g., learner predicted on training set, but predictions of test set required).
task properties not satisfied (e.g., binary classification measure on multiclass task).

Possible values are "ignore" (just return NaN) and "warn" (default, raise a warning before returning NaN).

task_properties

(character())
Required properties of the Task.

range

(numeric(2))
Lower and upper bound of possible performance scores.

minimize

(logical(1))
If TRUE, good predictions correspond to small values of performance scores.

packages

(character(1))
Set of required packages. These packages are loaded, but not attached.

man

(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object. Defaults to NA, but can be set by child classes.

Active bindings

predict_sets

(character())
During resample()/benchmark(), a Learner can predict on multiple sets. Per default, a learner only predicts observations in the test set (predict_sets == "test"). To change this behavior, set predict_sets to a non-empty subset of {"train", "test", "internal_valid"}. The "train" predict set contains the train ids from the resampling. This means that if a learner does validation and sets $validate to a ratio (creating the validation data from the training data), the train predictions will include the predictions for the validation data. Each set yields a separate Prediction object. Those can be combined via getters in ResampleResult/BenchmarkResult, or Measures can be configured to operate on specific subsets of the calculated prediction sets.

hash

(character(1))
Hash (unique identifier) for this object. The hash is calculated based on the id, the parameter settings, predict sets and the $score, $average, $aggregator, $obs_loss, $trafo method. Measure can define additional fields to be included in the hash by setting the field $.extra_hash.

properties

(character())
Properties of this measure.

average

(character(1))
Method for aggregation:

"micro": All predictions from multiple resampling iterations are first combined into a single Prediction object. Next, the scoring function of the measure is applied on this combined object, yielding a single numeric score.
"macro": The scoring function is applied on the Prediction object of each resampling iterations, each yielding a single numeric score. Next, the scores are combined with the aggregator function to a single numerical score.
"macro_weighted": The scoring function is applied on the Prediction object of each resampling iterations, each yielding a single numeric score. Next, the scores are combined with the aggregator function to a single numerical score. The scores are weighted by the total sample weights (if present, and if $use_weights is set to "use"), or the number of samples in each resampling iteration.
"custom": The measure comes with a custom aggregation method which directly operates on a ResampleResult.

aggregator

(function())
Function to aggregate scores computed on different resampling iterations.

use_weights

(character(1))
How to handle weights. Settings are "use", "ignore", and "error".

"use": Weights are used automatically if found in the task, as supported by the measure.
"ignore": Weights are ignored.
"error": throw an error if weights are present in the training Task.

For measures with the property "weights", this is initialized as "use". For measures with the property "requires_no_prediction", this is initialized as "ignore". For measures that have neither of the properties, this is initialized as "error". The latter behavior is to avoid cases where a user erroneously assumes that a measure supports weights when it does not. For measures that do not support weights, use_weights needs to be set to "ignore" if tasks with weights should be handled (by dropping the weights).

Methods

Method `new()`

Creates a new instance of this R6 class.

Note that this object is typically constructed via a derived classes, e.g. MeasureClassif or MeasureRegr.

Usage

Measure$new(
  id,
  task_type = NA,
  param_set = ps(),
  range = c(-Inf, Inf),
  minimize = NA,
  average = "macro",
  aggregator = NULL,
  obs_loss = NULL,
  properties = character(),
  predict_type = "response",
  predict_sets = "test",
  task_properties = character(),
  packages = character(),
  label = NA_character_,
  man = NA_character_,
  trafo = NULL
)

Arguments

id

(character(1))
Identifier for the new instance.

task_type

(character(1))
Type of task, e.g. "regr" or "classif". Must be an element of mlr_reflections$task_types$type.

param_set

(paradox::ParamSet)
Set of hyperparameters.

range

(numeric(2))
Feasible range for this measure as c(lower_bound, upper_bound). Both bounds may be infinite.

minimize

(logical(1))
Set to TRUE if good predictions correspond to small values, and to FALSE if good predictions correspond to large values. If set to NA (default), tuning this measure is not possible.

average

(character(1))
How to average multiple Predictions from a ResampleResult.

The default, "macro", calculates the individual performances scores for each Prediction and then uses the function defined in $aggregator to average them to a single number.

"macro_weighted" is similar to "macro", but uses weighted averages. Weights are taken from the weights_measure column of the resampled Task if present. Note that "macro_weighted" can differ from "macro" even if no weights are present or if $use_weights is set to "ignore", since then aggregation is done using uniform sample weights, which result in non-uniform weights for Predictions if they contain different numbers of samples.

If set to "micro", the individual Prediction objects are first combined into a single new Prediction object which is then used to assess the performance. The function in $aggregator is not used in this case.

aggregator

(function())
Function to aggregate over multiple iterations. The role of this function depends on the value of field "average":

"macro": A numeric vector of scores (one per iteration) is passed. The aggregate function defaults to mean() in this case.
"micro": The aggregator function is not used. Instead, predictions from multiple iterations are first combined and then scored in one go.
"custom": A ResampleResult is passed to the aggregate function.

obs_loss

(function or NULL)
The observation-wise loss function, e.g. zero-one for classification error.

properties

(character())
Properties of the measure. Must be a subset of mlr_reflections$measure_properties. Supported by mlr3:

"requires_task" (requires the complete Task),
"requires_learner" (requires the trained Learner),
"requires_model" (requires the trained Learner, including the fitted model),
"requires_train_set" (requires the training indices from the Resampling),
"na_score" (the measure is expected to occasionally return NA or NaN),
"weights" (support weighted scoring using sample weights from task, column role weights_measure), and
"primary_iters" (the measure explictly handles resamplings that only use a subset of their iterations for the point estimate)
"requires_no_prediction" (No prediction is required; This usually means that the measure extracts some information from the learner state.).

predict_type

(character(1))
Required predict type of the Learner. Possible values are stored in mlr_reflections$learner_predict_types.

predict_sets

(character())
Prediction sets to operate on, used in aggregate() to extract the matching predict_sets from the ResampleResult. Multiple predict sets are calculated by the respective Learner during resample()/benchmark(). Must be a non-empty subset of {"train", "test", "internal_valid"}. If multiple sets are provided, these are first combined to a single prediction object. Default is "test".

task_properties

(character())
Required task properties, see Task.

packages

(character())
Set of required packages. A warning is signaled by the constructor if at least one of the packages is not installed, but loaded (not attached) later on-demand via requireNamespace().

label

(character(1))
Label for the new instance.

man

(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object. The referenced help package can be opened via method $help().

trafo

(list() or NULL)
An optional list with two elements, containing the transformation "fn" and its derivative "deriv". The transformation function is the function that is applied after aggregating the pointwise losses, i.e. this requires an $obs_loss to be present. An example is sqrt for RMSE.

Method `format()`

Helper for print outputs.

Usage

Measure$format(...)

Arguments

...: (ignored).

Method `print()`

Printer.

Usage

Measure$print(...)

Arguments

...: (ignored).

Method `help()`

Opens the corresponding help page referenced by field $man.

Usage

Measure$help()

Method `score()`

Takes a Prediction (or a list of Prediction objects named with valid predict_sets) and calculates a numeric score. If the measure if flagged with the properties "requires_task", "requires_learner", "requires_model" or "requires_train_set", you must additionally pass the respective Task, the (trained) Learner or the training set indices. This is handled internally during resample()/benchmark().

Usage

Measure$score(prediction, task = NULL, learner = NULL, train_set = NULL)

Arguments

prediction: (Prediction | named list of Prediction).
task: (Task).
learner: (Learner).
train_set: (integer()).

Returns

numeric(1).

Method `aggregate()`

Aggregates multiple performance scores into a single score, e.g. by using the aggregator function of the measure.

Usage

Measure$aggregate(rr)

Arguments

rr: ResampleResult.

Returns

numeric(1).

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Measure$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Inheriting

Weights

See also

Public fields

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Method format()

Usage

Arguments

Method print()

Usage

Arguments

Method help()

Usage

Method score()

Usage

Arguments

Returns

Method aggregate()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `format()`

Method `print()`

Method `help()`

Method `score()`

Method `aggregate()`

Method `clone()`