1. Tasks wrap a DataBackend, an object to transparently interface different data storage types.

2. Tasks store meta-information, such as the role of the individual columns in the DataBackend. For example, for a classification task a single column must be marked as target column, and others as features.

Predefined (toy) tasks are stored in the dictionary mlr_tasks, e.g. iris or boston_housing.

S3 methods

• as.data.table(t)
Task -> data.table::data.table()
Returns the complete data as data.table::data.table().

The following methods change the task in-place:

• Any modification to $col_roles and $row_roles. This provides a different "view" on the data without altering the data itself.

• $filter() and $select() subset the set of active rows or features in $row_roles or $col_roles, respectively. This provides a different "view" on the data without altering the data itself.

• rbind() and cbind() change the task in-place by binding rows or columns to the data, but without modifying the original DataBackend. Instead, the methods first create a new DataBackendDataTable from the provided new data, and then merge both backends into an abstract DataBackend which merges the results on-demand.

• rename() wraps the DataBackend of the Task in an additional DataBackend which deals with the renaming. Also updates $col_roles and $col_info.

Other Task: TaskClassif, TaskRegr, TaskSupervised, mlr_tasks_boston_housing, mlr_tasks_german_credit, mlr_tasks_iris, mlr_tasks_mtcars, mlr_tasks_pima, mlr_tasks_sonar, mlr_tasks_spam, mlr_tasks_wine, mlr_tasks_zoo, mlr_tasks

Public fields

id

(character(1))
Identifier of the object. Used in tables, plot and text output.

task_type

(character(1))
Task type, e.g. "classif" or "regr".For a complete list of possible task types (depending on the loaded packages), see mlr_reflections$task_types$type.

backend

(DataBackend)
Abstract interface to the data of the task.

col_info

(data.table::data.table())
Table with with 3 columns:

• "id" (character()) stores the name of the column.

• "type" (character()) holds the storage type of the variable, e.g. integer, numeric or character. See mlr_reflections$task_feature_types for a complete list of allowed types. • "levels" stores a vector of distinct values (levels) for ordered and unordered factor variables. man (character(1)) String in the format [pkg]::[topic] pointing to a manual page for this object. Defaults to NA, but can be set by child classes. Active bindings hash (character(1)) Hash (unique identifier) for this object. row_ids (integer()) Returns the row ids of the DataBackend for observations with role "use". row_names (data.table::data.table()) Returns a table with two columns: • "row_id" (integer()), and • "row_name" (character()). feature_names (character()) Returns all column names with role == "feature". target_names (character()) Returns all column names with role "target". properties (character()) Set of task properties. Possible properties are are stored in mlr_reflections$task_properties. The following properties are currently standardized and understood by tasks in mlr3:

• "strata": The task is resampled using one or more stratification variables (role "stratum").

• "groups": The task comes with grouping/blocking information (role "group").

• "weights": The task comes with observation weights (role "weight").

Note that above listed properties are calculated from the $col_roles and may not be set explicitly. row_roles (named list()) Each row (observation) can have an arbitrary number of roles in the learning task: • "use": Use in train / predict / resampling. • "validation": Hold the observations back unless explicitly requested. Validation sets are not yet completely integrated into the package. row_roles keeps track of the roles with a named list, elements are named by row role and each element is a integer() vector of row ids. To alter the roles, just modify the list, e.g. with R's set functions (intersect(), setdiff(), union(), ...). col_roles (named list()) Each column (feature) can have an arbitrary number of the following roles: • "feature": Regular feature used in the model fitting process. • "target": Target variable. • "name": Row names / observation labels. To be used in plots. Can be queried with $row_names.

• "order": Data returned by $data() is ordered by this column (or these columns). • "group": During resampling, observations with the same value of the variable with role "group" are marked as "belonging together". They will be exclusively assigned to be either in the training set or in the test set for each resampling iteration. Only up to one column may have this role. • "stratum": Stratification variables. Multiple discrete columns may have this role. • "weight": Observation weights. Only up to one column (assumed to be discrete) may have this role. col_roles keeps track of the roles with a named list, the elements are named by column role and each element is a character vector of column names. To alter the roles, just modify the list, e.g. with R's set functions (intersect(), setdiff(), union(), ...). nrow (integer(1)) Returns the total number of rows with role "use". ncol (integer(1)) Returns the total number of columns with role "target" or "feature". feature_types (data.table::data.table()) Returns a table with columns id and type where id are the column names of "active" features of the task and type is the storage type. data_formats character() Vector of supported data output formats. A specific format can be chosen in the $data() method.

strata

(data.table::data.table())
If the task has columns designated with role "stratum", returns a table with one subpopulation per row and two columns:

• N (integer()) with the number of observations in the subpopulation, and

• row_id (list of integer()) as list column with the row ids in the respective subpopulation. Returns NULL if there are is no stratification variable. See Resampling for more information on stratification.

groups

(data.table::data.table())
If the task has a column with designated role "group", table with two columns:

• row_id (integer()), and

• grouping variable group (vector()).

Returns NULL if there are is no grouping column. See Resampling for more information on grouping.

weights

(data.table::data.table())
If the task has a column with designated role "weight", table with two columns:

• row_id (integer()), and

• observation weights weight (numeric()).

Returns NULL if there are is no weight column.

Methods

Method new()

Creates a new instance of this R6 class.

Note that this object is typically constructed via a derived classes, e.g. TaskClassif or TaskRegr.

Printer.

Arguments

rows

integer()
Row indices.

cols

character()
Column names.

data_format

(character(1))
Desired data format, e.g. "data.table" or "Matrix".

Returns

Depending on the DataBackend, but usually a data.table::data.table().

Method formula()

Constructs a formula(), e.g. [target] ~ [feature_1] + [feature_2] + ... + [feature_k], using the features provided in argument rhs (defaults to all columns with role "feature", symbolized by ".").

Arguments

n

(integer(1)).

Returns

data.table::data.table() with n rows.

Method levels()

Returns the distinct values for columns referenced in cols with storage type "factor" or "ordered". Argument cols defaults to all such columns with role "target" or "feature".

Note that this function ignores the row roles, it returns all levels available in the DataBackend. To update the stored level information, e.g. after subsetting a task with $filter(), call $droplevels().

Arguments

cols

character()
Column names.

Returns

Named integer().

Method filter()

Subsets the task, keeping only the rows specified via row ids rows.

Method select()

Subsets the task, keeping only the features specified via column names cols. Note that you cannot deselect the target column, for obvious reasons.

Arguments

data

(data.frame()).

Arguments

data

(data.frame()).

Method rename()

Renames columns by mapping column names in old to new column names in new (element-wise).

Method set_row_role()

Adds the roles new_roles to rows referred to by row ids rows. If exclusive is TRUE, the referenced rows will be removed from all other roles.

Arguments

rows

integer()
Row indices.

new_roles

(character()).

exclusive

(logical(1)).

Method droplevels()

Updates the cache of stored factor levels, removing all levels not present in the current set of active rows. cols defaults to all columns with storage type "factor" or "ordered".

Arguments

deep

Whether to make a deep clone.

Examples

# we use the inherited class TaskClassif here,
# Class Task is not intended for direct use
task = TaskClassif$new("iris", iris, target = "Species") task$nrow#> [1] 150task$ncol#> [1] 5task$feature_names#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" task$formula()#> Species ~ . #> NULL # de-select "Petal.Width" task$select(setdiff(task$feature_names, "Petal.Width")) task$feature_names#> [1] "Petal.Length" "Sepal.Length" "Sepal.Width"
task$cbind(data.frame(foo = 1:150)) task$head()#>    Species Petal.Length Sepal.Length Sepal.Width foo
#> 6:  setosa          1.7          5.4         3.9   6