This is the abstract base class for data backends.
Data backends provide a layer of abstraction for various data storage systems. It is not recommended to work directly with the DataBackend. Instead, all data access is handled transparently via the Task.
This package comes with two implementations for backends:
DataBackendDataTable which stores the data as
data.table::data.table()
.DataBackendMatrix which stores the data as sparse
Matrix::sparseMatrix()
.
To connect to out-of-memory database management systems such as SQL servers, see the extension package mlr3db.
Details
The required set of fields and methods to implement a custom DataBackend
is
listed in the respective sections (see DataBackendDataTable or
DataBackendMatrix for exemplary implementations of the interface).
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter10/advanced_technical_aspects_of_mlr3.html#sec-backends
Package mlr3db to interface out-of-memory data, e.g. SQL servers or duckdb.
Other DataBackend:
DataBackendDataTable
,
DataBackendMatrix
,
as_data_backend.Matrix()
Public fields
primary_key
(
character(1)
)
Column name of the primary key column of positive and unique integer row ids.data_formats
(
character()
)
Set of supported formats, e.g."data.table"
or"Matrix"
.
Active bindings
hash
(
character(1)
)
Hash (unique identifier) for this object.col_hashes
(named
character
)
Hash (unique identifier) for all columns except theprimary_key
: Acharacter
vector, named by the columns that each element refers to.
Columns of differentTask
s orDataBackend
s that have agreeingcol_hashes
always represent the same data, given that the samerow
s are selected. The reverse is not necessarily true: There can be columns with the same content that have differentcol_hashes
.
Methods
Method new()
Creates a new instance of this R6 class.
Note: This object is typically constructed via a derived classes, e.g.
DataBackendDataTable or DataBackendMatrix, or via the S3 method
as_data_backend()
.
Usage
DataBackend$new(data, primary_key, data_formats = "data.table")
Arguments
data
(any)
The format of the input data depends on the specialization. E.g., DataBackendDataTable expects adata.table::data.table()
and DataBackendMatrix expects aMatrix::Matrix()
from Matrix.primary_key
(
character(1)
)
Each DataBackend needs a way to address rows, which is done via a column of unique integer values, referenced here byprimary_key
. The use of this variable may differ between backends.data_formats
(
character()
)
Set of supported data formats which can be processed during$train()
and$predict()
, e.g."data.table"
.
Examples
data = data.table::data.table(id = 1:5, x = runif(5),
y = sample(letters[1:3], 5, replace = TRUE))
b = DataBackendDataTable$new(data, primary_key = "id")
print(b)
#> <DataBackendDataTable> (5x3)
#> id x y
#> <int> <num> <char>
#> 1 0.9686412 c
#> 2 0.4884955 c
#> 3 0.4778220 c
#> 4 0.7487929 c
#> 5 0.6676402 b
b$head(2)
#> Key: <id>
#> id x y
#> <int> <num> <char>
#> 1: 1 0.9686412 c
#> 2: 2 0.4884955 c
b$data(rows = 1:2, cols = "x")
#> x
#> <num>
#> 1: 0.9686412
#> 2: 0.4884955
b$distinct(rows = b$rownames, "y")
#> $y
#> [1] "c" "b"
#>
b$missings(rows = b$rownames, cols = names(data))
#> id x y
#> 0 0 0