| Title: | Preprocessing Operators and Pipelines for 'mlr3' |
| Version: | 0.9.0 |
| Description: | Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned. |
| License: | LGPL-3 |
| URL: | https://mlr3pipelines.mlr-org.com, https://github.com/mlr-org/mlr3pipelines |
| BugReports: | https://github.com/mlr-org/mlr3pipelines/issues |
| Depends: | R (≥ 3.3.0) |
| Imports: | backports, checkmate, data.table, digest, lgr, mlr3 (≥ 0.20.0), mlr3misc (≥ 0.17.0), paradox (≥ 1.0.0), R6, withr |
| Suggests: | ggplot2, glmnet, igraph, knitr, lme4, mlbench, bbotk (≥ 0.3.0), mlr3filters (≥ 0.8.1), mlr3learners, mlr3measures, nloptr, quanteda, rmarkdown, rpart, stopwords, testthat, visNetwork, bestNormalize, fastICA, kernlab, smotefamily, evaluate, NMF, MASS, GenSA, methods, vtreat, future, htmlwidgets, ranger, themis |
| ByteCompile: | true |
| Encoding: | UTF-8 |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| NeedsCompilation: | no |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr, rmarkdown |
| Collate: | 'CnfAtom.R' 'CnfClause.R' 'CnfFormula.R' 'CnfFormula_simplify.R' 'CnfSymbol.R' 'CnfUniverse.R' 'Graph.R' 'GraphLearner.R' 'mlr_pipeops.R' 'multiplicity.R' 'utils.R' 'PipeOp.R' 'PipeOpEnsemble.R' 'LearnerAvg.R' 'NO_OP.R' 'PipeOpTaskPreproc.R' 'PipeOpADAS.R' 'PipeOpBLSmote.R' 'PipeOpBoxCox.R' 'PipeOpBranch.R' 'PipeOpChunk.R' 'PipeOpClassBalancing.R' 'PipeOpClassWeights.R' 'PipeOpClassifAvg.R' 'PipeOpColApply.R' 'PipeOpColRoles.R' 'PipeOpCollapseFactors.R' 'PipeOpCopy.R' 'PipeOpDateFeatures.R' 'PipeOpDecode.R' 'PipeOpEncode.R' 'PipeOpEncodeImpact.R' 'PipeOpEncodeLmer.R' 'PipeOpEncodePL.R' 'PipeOpFeatureUnion.R' 'PipeOpFilter.R' 'PipeOpFixFactors.R' 'PipeOpHistBin.R' 'PipeOpICA.R' 'PipeOpImpute.R' 'PipeOpImputeConstant.R' 'PipeOpImputeHist.R' 'PipeOpImputeLearner.R' 'PipeOpImputeMean.R' 'PipeOpImputeMedian.R' 'PipeOpImputeMode.R' 'PipeOpImputeOOR.R' 'PipeOpImputeSample.R' 'PipeOpKernelPCA.R' 'PipeOpLearner.R' 'PipeOpLearnerCV.R' 'PipeOpLearnerPICVPlus.R' 'PipeOpLearnerQuantiles.R' 'PipeOpMissingIndicators.R' 'PipeOpModelMatrix.R' 'PipeOpMultiplicity.R' 'PipeOpMutate.R' 'PipeOpNMF.R' 'PipeOpNOP.R' 'PipeOpNearmiss.R' 'PipeOpOVR.R' 'PipeOpPCA.R' 'PipeOpProxy.R' 'PipeOpQuantileBin.R' 'PipeOpRandomProjection.R' 'PipeOpRandomResponse.R' 'PipeOpRegrAvg.R' 'PipeOpRemoveConstants.R' 'PipeOpRenameColumns.R' 'PipeOpRowApply.R' 'PipeOpScale.R' 'PipeOpScaleMaxAbs.R' 'PipeOpScaleRange.R' 'PipeOpSelect.R' 'PipeOpSmote.R' 'PipeOpSmoteNC.R' 'PipeOpSpatialSign.R' 'PipeOpSubsample.R' 'PipeOpTextVectorizer.R' 'PipeOpThreshold.R' 'PipeOpTomek.R' 'PipeOpTrafo.R' 'PipeOpTuneThreshold.R' 'PipeOpUnbranch.R' 'PipeOpVtreat.R' 'PipeOpYeoJohnson.R' 'Selector.R' 'TaskRegr_boston_housing.R' 'assert_graph.R' 'bibentries.R' 'greplicate.R' 'gunion.R' 'mlr_graphs.R' 'operators.R' 'pipeline_bagging.R' 'pipeline_branch.R' 'pipeline_convert_types.R' 'pipeline_greplicate.R' 'pipeline_ovr.R' 'pipeline_robustify.R' 'pipeline_stacking.R' 'pipeline_targettrafo.R' 'po.R' 'ppl.R' 'preproc.R' 'reexports.R' 'typecheck.R' 'zzz.R' |
| Packaged: | 2025-07-31 14:15:06 UTC; user |
| Author: | Martin Binder [aut, cre],
Florian Pfisterer |
| Maintainer: | Martin Binder <mlr.developer@mb706.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-07-31 23:20:11 UTC |
mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'
Description
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Author(s)
Maintainer: Martin Binder mlr.developer@mb706.com
Authors:
Florian Pfisterer pfistererf@googlemail.com (ORCID)
Lennart Schneider lennart.sch@web.de (ORCID)
Bernd Bischl bernd_bischl@gmx.net (ORCID)
Michel Lang michellang@gmail.com (ORCID)
Sebastian Fischer sebf.fischer@gmail.com (ORCID)
Susanne Dandl dandl.susanne@googlemail.com
Other contributors:
Keno Mersmann keno.mersmann@gmail.com [contributor]
Maximilian Mücke muecke.maximilian@gmail.com (ORCID) [contributor]
Lona Koers lona.koers@gmail.com [contributor]
See Also
Useful links:
Report bugs at https://github.com/mlr-org/mlr3pipelines/issues
PipeOp Composition Operator
Description
These operators creates a connection that "pipes" data from the source g1 into the sink g2.
Both source and sink can either be
a Graph or a PipeOp (or an object that can be automatically converted into a Graph or PipeOp, see as_graph() and as_pipeop()).
%>>% and %>>!% try to automatically match output channels of g1 to input channels of g2; this is only possible if either
the number of output channels of
g1(as given byg1$output) is equal to the number of input channels ofg2(as given byg2$input), or-
g1has only one output channel (i.e.g1$outputhas one line), or -
g2has only one input channel, which is a vararg channel (i.e.g2$inputhas one line, withnameentry"...").
Connections between channels are created in the
order in which they occur in g1 and g2, respectively: g1's output channel 1 is connected to g2's input
channel 1, channel 2 to 2 etc.
%>>% always creates deep copies of its input arguments, so they cannot be modified by reference afterwards.
To access individual PipeOps after composition, use the resulting Graph's $pipeops list.
%>>!%, on the other hand, tries to avoid cloning its first argument: If it is a Graph, then this Graph
will be modified in-place.
When %>>!% fails, then it leaves g1 in an incompletely modified state. It is therefore usually recommended to use
%>>%, since the very marginal gain of performance from
using %>>!% often does not outweigh the risk of either modifying objects by-reference that should not be modified or getting
graphs that are in an incompletely modified state. However,
when creating long Graphs, chaining with %>>!% instead of %>>% can give noticeable performance benefits
because %>>% makes a number of clone()-calls that is quadratic in chain length, %>>!% only linear.
concat_graphs(g1, g2, in_place = FALSE) is equivalent to g1 %>>% g2. concat_graphs(g1, g2, in_place = TRUE) is equivalent to g1 %>>!% g2.
Both arguments of %>>% are automatically converted to Graphs using as_graph(); this means that objects on either side may be objects
that can be automatically converted to PipeOps (such as Learners or Filters), or that can
be converted to Graphs. This means, in particular, lists of Graphs, PipeOps or objects convertible to that, because
as_graph() automatically applies gunion() to lists. See examples. If the first argument of %>>!% is not a Graph, then
it is cloned just as when %>>% is used; %>>!% only avoids clone() if the first argument is a Graph.
Note that if g1 is NULL, g2 converted to a Graph will be returned.
Analogously, if g2 is NULL, g1 converted to a Graph will be returned.
Usage
g1 %>>% g2
concat_graphs(g1, g2, in_place = FALSE)
g1 %>>!% g2
Arguments
g1 |
( |
g2 |
( |
in_place |
( |
Value
See Also
Other Graph operators:
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Examples
o1 = PipeOpScale$new()
o2 = PipeOpPCA$new()
o3 = PipeOpFeatureUnion$new(2)
# The following two are equivalent:
pipe1 = o1 %>>% o2
pipe2 = Graph$new()$
add_pipeop(o1)$
add_pipeop(o2)$
add_edge(o1$id, o2$id)
# Note automatical gunion() of lists.
# The following three are equivalent:
graph1 = list(o1, o2) %>>% o3
graph2 = gunion(list(o1, o2)) %>>% o3
graph3 = Graph$new()$
add_pipeop(o1)$
add_pipeop(o2)$
add_pipeop(o3)$
add_edge(o1$id, o3$id, dst_channel = 1)$
add_edge(o2$id, o3$id, dst_channel = 2)
pipe1 %>>!% o3 # modify pipe1 in-place
pipe1 # contains o1, o2, and o3 now.
o1 %>>!% o2
o1 # not changed, becuase not a Graph.
Atoms for CNF Formulas
Description
CnfAtom objects represent a single statement that is used to build up CNF formulae.
They are mostly intermediate, created using the %among% operator or CnfAtom()
directly, and combined into CnfClause and CnfFormula objects.
CnfClause and CnfFormula do not, however, contain CnfAtom objects directly,
CnfAtoms contain an indirect reference to a CnfSymbol by referencing its name
and its CnfUniverse. They furthermore contain a set of values. An CnfAtom
represents a statement asserting that the given symbol takes up one of the
given values.
If the set of values is empty, the CnfAtom represents a contradiction (FALSE).
If it is the full domain of the symbol, the CnfAtom represents a tautology (TRUE).
These values can be converted to, and from, logical(1) values using as.logical()
and as.CnfAtom().
CnfAtom objects can be negated using the ! operator, which will return the CnfAtom
representing set membership in the complement of the symbol with respect to its domain.
CnfAtoms can furthermore be combined using the | operator to form a CnfClause,
and using the & operator to form a CnfFormula. This happens even if the
resulting statement could be represented as a single CnfAtom.
This is part of the CNF representation tooling, which is currently considered experimental; it is for internal use.
Usage
CnfAtom(symbol, values)
e1 %among% e2
as.CnfAtom(x)
Arguments
symbol |
( |
values |
( |
e1 |
( |
e2 |
( |
x |
(any) |
Details
We would have preferred to overload the %in% operator, but this is currently
not easily possible in R. We therefore created the %among% operator.
The internal representation of a CnfAtom may change in the future.
Value
A new CnfAtom object.
See Also
Other CNF representation objects:
CnfClause(),
CnfFormula(),
CnfSymbol(),
CnfUniverse()
Examples
u = CnfUniverse()
X = CnfSymbol(u, "X", c("a", "b", "c"))
CnfAtom(X, c("a", "b"))
X %among% "a"
X %among% character(0)
X %among% c("a", "b", "c")
as.logical(X %among% character(0))
as.CnfAtom(TRUE)
!(X %among% "a")
X %among% "a" | X %among% "b" # creates a CnfClause
X %among% "a" & X %among% c("a", "b") # creates a CnfFormula
Clauses in CNF Formulas
Description
A CnfClause is a disjunction of CnfAtom objects. It represents a statement
that is true if at least one of the atoms is true. These are for example of the form
X %among% c("a", "b", "c") | Y %among% c("d", "e", "f") | ...
CnfClause objects can be constructed explicitly, using the CnfClause() constructor,
or implicitly, by using the | operator on CnfAtoms or other CnfClause objects.
CnfClause objects which are not tautologies or contradictions are named lists;
the value ranges of each symbol can be accessed using [[, and these clauses
can be subset using [ to get clauses containing only the indicated symbols.
However, to get a list of CnfAtom objects, use as.list().
Note that the simplified form of a clause containing a contradiction is the empty list.
Upon construction, the CnfClause is simplified by (1) removing contradictions, (2) unifying
atoms that refer to the same symbol, and (3) evaluating to TRUE if any atom is TRUE.
Note that the order of atoms in a clause is not preserved.
Using CnfClause() on lists that contain other CnfClause objects will create
a clause that is the disjunction of all atoms in all clauses.
If a CnfClause contains no atoms, or only FALSE atoms, it evaluates to FALSE.
If it contains at least one atom that is always true, the clause evaluates to TRUE.
These values can be converted to, and from, logical(1) values using as.logical()
and as.CnfClause().
CnfClause objects can be negated using the ! operator, and combined using the
& operator. Both of these operations return a CnfFormula, even if the result
could in principle be represented as a single CnfClause.
This is part of the CNF representation tooling, which is currently considered experimental; it is for internal use.
Usage
CnfClause(atoms)
as.CnfClause(x)
Arguments
atoms |
( |
x |
(any) |
Details
We are undecided whether it is a better idea to have as.list() return a named list
or an unnamed one. Calling as.list() on a CnfClause with a tautology returns
a tautology-atom, which does not have a name. We currently return a named list
for other clauses, as this makes subsetting by name commute with as.list().
However, this behaviour may change in the future.
Value
A new CnfClause object.
See Also
Other CNF representation objects:
CnfAtom(),
CnfFormula(),
CnfSymbol(),
CnfUniverse()
Examples
u = CnfUniverse()
X = CnfSymbol(u, "X", c("a", "b", "c"))
Y = CnfSymbol(u, "Y", c("d", "e", "f"))
CnfClause(list(X %among% c("a", "b"), Y %among% c("d", "e")))
cls = X %among% c("a", "b") | Y %among% c("d", "e")
cls
as.list(cls)
as.CnfClause(X %among% c("a", "b"))
# The same symbols are unified
X %among% "a" | Y %among% "d" | X %among% "b"
# tautology evaluates to TRUE
X %among% "a" | X %among% "b" | X %among% "c"
# contradictions are removed
X %among% "a" | Y %among% character(0)
# create CnfFormula:
!(X %among% "a" | Y %among% "d")
# also a CnfFormula, even if it contains a single clause:
!CnfClause(list(X %among% "a"))
(X %among% c("a", "c") | Y %among% "d") &
(X %among% c("a", "b") | Y %among% "d")
CNF Formulas
Description
A CnfFormula is a conjunction of CnfClause objects. It represents a statement
that is true if all of the clauses are true. These are for example of the form
(X %among% "a" | Y %among% "d") & Z %among% "g"
CnfFormula objects can be constructed explicitly, using the CnfFormula() constructor,
or implicitly, by using the & operator on CnfAtoms, CnfClauses, or other CnfFormula objects.
To get individual clauses from a formula, [[ should not be used; instead, use as.list().
Note that the simplified form of a formula containing a tautology is the empty list.
Upon construction, the CnfFormula is simplified by using various heuristics.
This includes unit propagation, subsumption elimination, self/hidden subsumption elimination,
hidden tautology elimination, and resolution subsumption elimination (see examples).
Note that the order of clauses in a formula is not preserved.
Using CnfFormula() on lists that contain other CnfFormula objects will create
a formula that is the conjunction of all clauses in all formulas.
This may be somewhat more efficient than applying & many times in a row.
If a CnfFormula contains no clauses, or only TRUE clauses, it evaluates to TRUE.
If it contains at least one clause that is, by itself, always false, the formula evaluates to FALSE.
Not all contradictions between clauses are recognized, however.
These values can be converted to, and from, logical(1) values using as.logical()
and as.CnfFormula().
CnfFormula objects can be negated using the ! operator. Beware that this
may lead to an exponential blow-up in the number of clauses.
This is part of the CNF representation tooling, which is currently considered experimental; it is for internal use.
Usage
CnfFormula(clauses)
as.CnfFormula(x)
Arguments
clauses |
( |
x |
(any) |
Value
A new CnfFormula object.
See Also
Other CNF representation objects:
CnfAtom(),
CnfClause(),
CnfSymbol(),
CnfUniverse()
Examples
u = CnfUniverse()
X = CnfSymbol(u, "X", c("a", "b", "c"))
Y = CnfSymbol(u, "Y", c("d", "e", "f"))
Z = CnfSymbol(u, "Z", c("g", "h", "i"))
frm = (X %among% c("a", "b") | Y %among% c("d", "e")) &
Z %among% c("g", "h")
frm
# retrieve individual clauses
as.list(frm)
# Negation of a formula
# Note the parentheses, otherwise `!` would be applied to the first clause only.
!((X %among% c("a", "b") | Y %among% c("d", "e")) &
Z %among% c("g", "h"))
## unit propagation
# The second clause can not be satisfied when X is "b", so "b" can be
# removed from the possibilities in the first clause.
(X %among% c("a", "b") | Y %among% c("d", "e")) &
X %among% c("a", "c")
## subsumption elimination
# The first clause is a subset of the second clause; whenever the
# first clause is satisfied, the second clause is satisfied as well, so the
# second clause can be removed.
(X %among% "a" | Y %among% c("d", "e")) &
(X %among% c("a", "b") | Y %among% c("d", "e") | Z %among% "g")
## self subsumption elimination
# If the first clause is satisfied but X is not "a", then Y must be "e".
# The `Y %among% "d"` part of the first clause can therefore be removed.
(X %among% c("a", "b") | Y %among% "d") &
(X %among% "a" | Y %among% "e")
## resolution subsumption elimination
# The first two statements can only be satisfied if Y is either "d" or "e",
# since when X is "a" then Y must be "e", and when X is "b" then Y must be "d".
# The third statement is therefore implied by the first two, and can be
# removed.
(X %among% "a" | Y %among% "d") &
(X %among% "b" | Y %among% "e") &
(Y %among% c("d", "e"))
## hidden tautology elimination / hidden subsumption elimination
# When considering the first two clauses only, adding another atom
# `Z %among% "i"` to the first clause would not change the formula, since
# whenever Z is "i", the second clause would need to be satisfied in a way
# that would also satisfy the first clause, making this atom redundant
# ("hidden literal addition"). Considering the pairs of clause 1 and 3, and
# clauses 1 and 4, one could likewise add `Z %among% "g"` and
#' `Z %among% "h"`, respectively. This would reveal the first clausee to be
# a "hidden" tautology: it is equivalent to a clause containing the
# atom `Z %among% c("g", "h", "i")` == TRUE.
# Alternatively, one could perform "hidden" resolution subsumption using
# clause 4 after having added the atom `Z %among% c("g", "i")` to the first
# clause by using clauses 2 and 3.
(X %among% c("a", "b") | Y %among% c("d", "e")) &
(X %among% "a" | Z %among% c("g", "h")) &
(X %among% "b" | Z %among% c("h", "i")) &
(Y %among% c("d", "e") | Z %among% c("g", "i"))
## Simple contradictions are recognized:
(X %among% "a") & (X %among% "b")
# Tautologies are preserved
(X %among% c("a", "b", "c")) & (Y %among% c("d", "e", "f"))
# But not all contradictions are recognized.
# Builtin heuristic CnfFormula preprocessing is not a SAT solver.
contradiction = (X %among% "a" | Y %among% "d") &
(X %among% "b" | Y %among% "e") &
(X %among% "c" | Y %among% "f")
contradiction
# Negation of a contradiction results in a tautology, which is recognized
# and simplified to TRUE. However, note that this operation (1) generally has
# exponential complexity in the number of terms and (2) is currently also not
# particularly well optimized
!contradiction
Symbols for CNF Formulas
Description
Representation of Symbols used in CNF formulas. Symbols have a name and a
domain (a set of possible values), and are stored in a CnfUniverse.
Once created, it is currently not intended to modify or delete symbols.
Symbols can be used in CNF formulas by creating CnfAtom objects, either
by using the %among% operator or by using the CnfAtom() constructor
explicitly.
This is part of the CNF representation tooling, which is currently considered experimental; it is for internal use.
Usage
CnfSymbol(universe, name, domain)
Arguments
universe |
( |
name |
( |
domain |
( |
Value
A new CnfSymbol object.
See Also
Other CNF representation objects:
CnfAtom(),
CnfClause(),
CnfFormula(),
CnfUniverse()
Examples
u = CnfUniverse()
X = CnfSymbol(u, "X", c("a", "b", "c"))
# Use symbols to create CnfAtom objects
X %among% c("a", "b")
X %among% "a"
X %among% character(0)
X %among% c("a", "b", "c")
Symbol Table for CNF Formulas
Description
A symbol table for CNF formulas. The CnfUniverse is a by-reference object
that stores the domain of each symbol. Symbols are created with CnfSymbol()
and can be retrieved with $.
Using [[ retrieves a given symbol's domain.
It is only possible to combine symbols from the same (identical) universe.
This is part of the CNF representation tooling, which is currently considered experimental; it is for internal use.
Usage
CnfUniverse()
Value
A new CnfUniverse object.
See Also
Other CNF representation objects:
CnfAtom(),
CnfClause(),
CnfFormula(),
CnfSymbol()
Examples
u = CnfUniverse()
X = CnfSymbol(u, "X", c("a", "b", "c"))
Y = CnfSymbol(u, "Y", c("d", "e", "f"))
u$X
u[["Y"]]
X %among% c("a", "c")
u$X %among% c("a", "c")
Y %among% c("d", "e", "f")
Y %among% character(0)
u$X %among% "a" | u$Y %among% "d"
Graph Base Class
Description
A Graph is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.
A Graph is most useful when used together with Learner objects encapsulated as PipeOpLearner. In this case,
the Graph produces Prediction data during its $predict() phase and can be used as a Learner
itself (using the GraphLearner wrapper). However, the Graph can also be used without Learner objects to simply
perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with
dependency structure (although the PipeOps for this would need to be written).
Format
Construction
Graph$new()
Internals
A Graph is made up of a list of PipeOps, and a data.table of edges. Both for training and prediction, the Graph
performs topological sorting of the PipeOps and executes their respective $train() or $predict() functions in order, moving
the PipeOp results along the edges as input to other PipeOps.
Fields
-
pipeops:: namedlistofPipeOp
Contains allPipeOps in theGraph, named by thePipeOp's$ids. -
edges::data.tablewith columnssrc_id(character),src_channel(character),dst_id(character),dst_channel(character)
Table of connections between thePipeOps. Adata.table.src_idanddst_idare$ids ofPipeOps that must be present in the$pipeopslist.src_channelanddst_channelmust respectively be$outputand$inputchannel names of the respectivePipeOps. -
is_trained::logical(1)
Is theGraph, i.e. are all of itsPipeOps, trained, and can theGraphbe used for prediction? -
lhs::character
Ids of the 'left-hand-side'PipeOps that have some unconnected input channels and therefore act asGraphinput layer. -
rhs::character
Ids of the 'right-hand-side'PipeOps that have some unconnected output channels and therefore act asGraphoutput layer. -
input::data.tablewith columnsname(character),train(character),predict(character),op.id(character),channel.name(character)
Input channels of theGraph. For each channel lists the name, input type during training, input type during prediction,PipeOp$idof thePipeOpthe channel pertains to, and channel name as thePipeOpknows it. -
output::data.tablewith columnsname(character),train(character),predict(character),op.id(character),channel.name(character)
Output channels of theGraph. For each channel lists the name, output type during training, output type during prediction,PipeOp$idof thePipeOpthe channel pertains to, and channel name as thePipeOpknows it. -
packages::character
Set of all required packages for the various methods in theGraph, a set union of all required packages of all containedPipeOpobjects. -
state:: namedlist
Get / Set the$stateof each of the members ofPipeOp. -
param_set::ParamSet
Parameters and parameter constraints. Parameter values are in$param_set$values. These are the union of$param_sets of allPipeOps in theGraph. Parameter names as seen by theGraphhave the naming scheme<PipeOp$id>.<PipeOp original parameter name>. Changing$param_set$valuesalso propagates the changes directly to the containedPipeOps and is an alternative to changing aPipeOps$param_set$valuesdirectly. -
hash::character(1)
Stores a checksum calculated on theGraphconfiguration, which includes allPipeOphashes (and therefore their$param_set$values) and a hash of$edges. -
phash::character(1)
Stores a checksum calculated on theGraphconfiguration, which includes allPipeOphashes except their$param_set$values, and a hash of$edges. -
keep_results::logical(1)
Whether to store intermediate results in thePipeOp's$.resultslot, mostly for debugging purposes. DefaultFALSE. -
man::character(1)
Identifying string of the help page that shows withhelp().
Methods
-
ids(sorted = FALSE)
(logical(1)) ->character
Get IDs of allPipeOps. This is in order thatPipeOps were added ifsortedisFALSE, and topologically sorted ifsortedisTRUE. -
add_pipeop(op, clone = TRUE)
(PipeOp|Learner|Filter|...,logical(1)) ->self
MutatesGraphby adding aPipeOpto theGraph. This does not add any edges, so the newPipeOpwill not be connected within theGraphat first.
Instead of supplying aPipeOpdirectly, an object that can naturally be converted to aPipeOpcan also be supplied, e.g. aLearneror aFilter; seeas_pipeop(). The argument given asopis cloned ifcloneisTRUE(default); to access aGraph'sPipeOps by-reference, use$pipeops.
Note that$add_pipeop()is a relatively low-level operation, it is recommended to build graphs using%>>%. -
add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
(character(1),character(1),character(1)|numeric(1)|NULL,character(1)|numeric(1)|NULL) ->self
Add an edge fromPipeOpsrc_id, and its channelsrc_channel(identified by its name or number as listed in thePipeOp's$output), toPipeOpdst_id's channeldst_channel(identified by its name or number as listed in thePipeOp's$input). If source or destinationPipeOphave only one input / output channel andsrc_channel/dst_channelare therefore unambiguous, they can be omitted (i.e. left asNULL). -
chain(gs, clone = TRUE)
(listofGraphs,logical(1)) ->self
Takes a list ofGraphs orPipeOps (or objects that can be automatically converted intoGraphs orPipeOps, seeas_graph()andas_pipeop()) as inputs and joins them in a serialGraphcoming afterself, as if connecting them using%>>%. -
plot(html = FALSE, horizontal = FALSE)
(logical(1),logical(1)) ->NULL
Plot theGraph, using either the igraph package (forhtml = FALSE, default) or thevisNetworkpackage forhtml = TRUEproducing ahtmlWidget. ThehtmlWidgetcan be rescaled usingvisOptions. Forhtml = FALSE, the orientation of the plotted graph can be controlled throughhorizontal. -
print(dot = FALSE, dotname = "dot", fontsize = 24L)
(logical(1),character(1),integer(1)) ->NULL
Print a representation of theGraphon the console. IfdotisFALSE, output is a table with one row for each containedPipeOpand columnsID($idofPipeOp),State(short representation of$stateofPipeOp),sccssors(PipeOps that take their input directly from thePipeOpon this line), andprdcssors(thePipeOps that produce the data that is read as input by thePipeOpon this line). IfdotisTRUE, print a DOT representation of theGraphon the console. The DOT output can be named via the argumentdotnameand thefontsizecan also be specified. -
set_names(old, new)
(character,character) ->self
RenamePipeOps: Change ID of eachPipeOpas identified byoldto the corresponding item innew. This should be used instead of changing aPipeOp's$idvalue directly! -
update_ids(prefix = "", postfix = "")
(character,character) ->self
Pre- or postfixPipeOp's existing ids. Bothprefixandpostfixdefault to"", i.e. no changes. -
train(input, single_input = TRUE)
(any,logical(1)) -> namedlist
TrainGraphby traversing theGraphs' edges and calling all thePipeOp's$trainmethods in turn. Return a namedlistof outputs for each unconnectedPipeOpout-channel, named according to theGraph's$outputnamecolumn. During training, the$statemember of eachPipeOps will be set and the$is_trainedslot of theGraph(and each individualPipeOp) will consequently be set toTRUE.
Ifsingle_inputisTRUE, theinputvalue will be sent to each unconnectedPipeOp's input channel (as listed in theGraph's$input). Typically,inputshould be aTask, although this is dependent on thePipeOps in theGraph. Ifsingle_inputisFALSE, theninputshould be alistwith the same length as theGraph's$inputtable has rows; each list item will be sent to a corresponding input channel of theGraph. Ifinputis a namedlist, names must correspond to input channel names ($input$name) and inputs will be sent to the channels by name; otherwise they will be sent to the channels in order in which they are listed in$input. -
predict(input, single_input = TRUE)
(any,logical(1)) ->listofany
Predict with theGraphby calling all thePipeOp's$trainmethods. Input and output, as well as the function of thesingle_inputargument, are analogous to$train(). -
help(help_type)
(character(1)) -> help file
Displays the help file of the concretePipeOpinstance.help_typeis one of"text","html","pdf"and behaves as thehelp_typeargument of R'shelp().
See Also
Other mlr3pipelines backend related:
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Examples
library("mlr3")
g = Graph$new()$
add_pipeop(PipeOpScale$new(id = "scale"))$
add_pipeop(PipeOpPCA$new(id = "pca"))$
add_edge("scale", "pca")
g$input
g$output
task = tsk("iris")
trained = g$train(task)
trained[[1]]$data()
task$filter(1:10)
predicted = g$predict(task)
predicted[[1]]$data()
Multiplicity
Description
A Multiplicity class S3 object.
The function of multiplicities is to indicate that PipeOps should be executed
multiple times with multiple values.
A Multiplicity is a container, like a
list(), that contains multiple values. If the message that is passed along the
edge of a Graph is a Multiplicity-object, then the PipeOp that receives
this object will usually be called once for each contained value. The result of
each of these calls is then, again, packed in a Multiplicity and sent along the
outgoing edge(s) of that PipeOp. This means that a Multiplicity can cause
multiple PipeOps in a row to be run multiple times, where the run for each element
of the Multiplicity is independent from the others.
Most PipeOps only return a Multiplicity if their input was a Multiplicity
(and after having run their code multiple times, once for each entry). However,
there are a few special PipeOps that are "aware" of Multiplicity objects. These
may either create a Multiplicity even though not having a Multiplicity input
(e.g. PipeOpReplicate or PipeOpOVRSplit) – causing the subsequent PipeOps
to be run multiple times – or collect a Multiplicity, being called only once
even though their input is a Multiplicity (e.g. PipeOpOVRUnite or PipeOpFeatureUnion
if constructed with the collect_multiplicity argument set to TRUE). The combination
of these mechanisms makes it possible for parts of a Graph to be called variably
many times if "sandwiched" between Multiplicity creating and collecting PipeOps.
Whether a PipeOp creates or collects a Multiplicity is indicated by the $input
or $output slot (which indicate names and types of in/out channels). If the train and
predict types of an input or output are surrounded by square brackets ("[", "]"), then
this channel handles a Multiplicity explicitly. Depending on the function of the PipeOp,
it will usually collect (input channel) or create (output channel) a Multiplicity.
PipeOps without this indicator are Multiplicity agnostic and blindly execute their
function multiple times when given a Multiplicity.
If a PipeOp is trained on a Multiplicity, the $state slot is set to a Multiplicity
as well; this Multiplicity contains the "original" $state resulting from each individual
call of the PipeOp with the input Multiplicity's content. If a PipeOp was trained
with a Multiplicity, then the predict() argument must be a Multiplicity with the same
number of elements.
Usage
Multiplicity(...)
Arguments
... |
|
Value
See Also
Other Special Graph Messages:
NO_OP
Other Experimental Features:
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Other Multiplicity PipeOps:
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
No-Op Sentinel Used for Alternative Branching
Description
Special data type for no-ops. Distinct from NULL for easier debugging
and distinction from unintentional NULL returns.
Usage
NO_OP
Format
R6 object.
See Also
Other Path Branching:
filter_noop(),
is_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
Other Special Graph Messages:
Multiplicity()
PipeOp Base Class
Description
A PipeOp represents a transformation of a given "input" into a given "output", with two stages: "training"
and "prediction". It can be understood as a generalized function that not only has multiple inputs, but
also multiple outputs (as well as two stages). The "training" stage is used when training a machine learning pipeline or
fitting a statistical model, and the "predicting" stage is then used for making predictions on new data.
To perform training, the $train() function is called which takes inputs and transforms them, while simultaneously storing information
in its $state slot. For prediction, the $predict() function is called, where the $state information can be used to influence the transformation
of the new data.
A PipeOp is usually used in a Graph object, a representation of a computational graph. It can have
multiple input channels—think of these as multiple arguments to a function, for example when averaging
different models—, and multiple output channels—a transformation may
return different objects, for example different subsets of a Task. The purpose of the Graph is to
connect different outputs of some PipeOps to inputs of other PipeOps.
Input and output channel information of a PipeOp is defined in the $input and $output slots; each channel has a name, a required
type during training, and a required type during prediction. The $train() and $predict() functions are called with a list argument
that has one entry for each declared channel (with one exception, see next paragraph). The list is automatically type-checked
for each channel against $input and then passed on to the private$.train() or private$.predict() functions. There the data is processed and
a result list is created. This list is again type-checked for declared output types of each channel. The length and types of the result
list is as declared in $output.
A special input channel name is "...", which creates a vararg channel that takes arbitrarily many arguments, all of the same type. If the $input
table contains an "..."-entry, then the input given to $train() and $predict() may be longer than the number of declared input channels.
This class is an abstract base class that all PipeOps being used in a Graph should inherit from, and
is not intended to be instantiated.
Format
Abstract R6Class.
Construction
PipeOp$new(id, param_set = ps(), param_vals = list(), input, output, packages = character(0), tags = character(0))
-
id::character(1)
Identifier of resulting object. See$idslot. -
param_set::ParamSet|listofexpression
Parameter space description. This should be created by the subclass and given tosuper$initialize(). If this is aParamSet, it is used as thePipeOp'sParamSetdirectly. Otherwise it must be alistof expressions e.g. created byalist()that evaluate toParamSets. TheseParamSetare combined using aParamSetCollection. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
input::data.tablewith columnsname(character),train(character),predict(character)
Sets the$inputslot of the resulting object; see description there. -
output::data.tablewith columnsname(character),train(character),predict(character)
Sets the$outputslot of the resulting object; see description there. -
packages::character
Set of all required packages for thePipeOp's$trainand$predictmethods. See$packagesslot. Default ischaracter(0). -
tags::character
A set of tags associated with thePipeOp. Tags describe a PipeOp's purpose. Can be used to filteras.data.table(mlr_pipeops). Default is"abstract", indicating an abstractPipeOp.
Internals
PipeOp is an abstract class with abstract functions private$.train() and private$.predict(). To create a functional
PipeOp class, these two methods must be implemented. Each of these functions receives a named list according to
the PipeOp's input channels, and must return a list (names are ignored) with values in the order of output
channels in $output. The private$.train() and private$.predict() function should not be called by the user;
instead, a $train() and $predict() should be used. The most convenient usage is to add the PipeOp
to a Graph (possibly as singleton in that Graph), and using the Graph's $train() / $predict() methods.
private$.train() and private$.predict() should treat their inputs as read-only. If they are R6 objects,
they should be cloned before being manipulated in-place. Objects, or parts of objects, that are not changed, do
not need to be cloned, and it is legal to return the same identical-by-reference objects to multiple outputs.
Fields
-
id::character
ID of thePipeOp. IDs are user-configurable, and IDs ofPipeOps must be unique within aGraph. IDs ofPipeOps must not be changed once they are part of aGraph, instead theGraph's$set_names()method should be used. -
packages::character
Packages required for thePipeOp. Functions that are not in base R should still be called using::(or explicitly attached usingrequire()) inprivate$.train()andprivate$.predict(), but packages declared here are checked before any (possibly expensive) processing has started within aGraph. -
param_set::ParamSet
Parameters and parameter constraints. Parameter values that influence the functioning of$trainand / or$predictare in the$param_set$valuesslot; these are automatically checked against parameter constraints in$param_set. -
state::any|NULL
Method-dependent state obtained during training step, and usually required for the prediction step. This isNULLif and only if thePipeOphas not been trained. The$stateis the only slot that can be reliably modified during$train(), becauseprivate$.train()may theoretically be executed in a differentR-session (e.g. for parallelization).$stateshould furthermore always be set to something with copy-semantics, since it is never cloned. This is a limitation not ofPipeOpormlr3pipelines, but of the way the system as a whole works, together withGraphLearnerand mlr3. -
input::data.tablewith columnsname(character),train(character),predict(character)
Input channels ofPipeOp. Columnnamegives the names (and order) of values in the list given to$train()and$predict(). Columntrainis the (S3) class that an input object must conform to during training, columnpredictis the (S3) class that an input object must conform to during prediction. Types are checked by thePipeOpitself and do not need to be checked byprivate$.train()/private$.predict()code.
A special name is"...", which creates a vararg input channel that accepts a variable number of inputs.
If a row has bothtrainandpredictvalues enclosed by square brackets ("[", "]"), then this channel isMultiplicity-aware. If thePipeOpreceives aMultiplicityvalue on these channels, thisMultiplicityis given to the.train()and.predict()functions directly. Otherwise, theMultiplicityis transparently unpacked and the.train()and.predict()functions are called multiple times, once for eachMultiplicityelement. The type enclosed by square brackets indicates that only aMultiplicitycontaining values of this type are accepted. SeeMultiplicityfor more information. -
output::data.tablewith columnsname(character),train(character),predict(character)
Output channels ofPipeOp, in the order in which they will be given in the list returned by$trainand$predictfunctions. Columntrainis the (S3) class that an output object must conform to during training, columnpredictis the (S3) class that an output object must conform to during prediction. ThePipeOpchecks values returned byprivate$.train()andprivate$.predict()against these types specifications.
If a row has bothtrainandpredictvalues enclosed by square brackets ("[", "]"), then this signals that the channel emits aMultiplicityof the indicated type. SeeMultiplicityfor more information. -
innum::numeric(1)
Number of input channels. This equalsnrow($input). -
outnum::numeric(1)
Number of output channels. This equalsnrow($output). -
is_trained::logical(1)
Indicate whether thePipeOpwas already trained and can therefore be used for prediction. -
tags::character
A set of tags associated with thePipeOp. Tags describe a PipeOp's purpose. Can be used to filteras.data.table(mlr_pipeops).PipeOptags are inherited and child classes can introduce additional tags. -
hash::character(1)
Checksum calculated on thePipeOp, depending on thePipeOp'sclassand the slots$idand$param_set$values. If aPipeOp's functionality may change depending on more than these values, it should inherit the$hashactive binding and calculate the hash asdigest(list(super$hash, <OTHER THINGS>), algo = "xxhash64"). -
phash::character(1)
Checksum calculated on thePipeOp, depending on thePipeOp'sclassand the slots$idbut ignoring$param_set$values. If aPipeOp's functionality may change depending on more than these values, it should inherit the$hashactive binding and calculate the hash asdigest(list(super$hash, <OTHER THINGS>), algo = "xxhash64"). -
.result::list
If theGraph's$keep_resultsflag is set toTRUE, then the intermediate Results of$train()and$predict()are saved to this slot, exactly as they are returned by these functions. This is mainly for debugging purposes and done, if requested, by theGraphbackend itself; it should not be done explicitly byprivate$.train()orprivate$.predict(). -
man::character(1)
Identifying string of the help page that shows withhelp(). -
label::character(1)
Description of thePipeOp's functionality. Derived from the title of its help page. -
properties::character()
The properties of thePipeOp. Currently supported values are:-
"validation": thePipeOpcan make use of the$internal_valid_taskof anmlr3::Task. This is for example used forPipeOpLearners that wrap aLearnerwith this property, seemlr3::Learner.PipeOps that have this property, also have a$validatefield, which controls whether to use the validation task, as well as a$internal_valid_scoresfield, which allows to access the internal validation scores after training. -
"internal_tuning": thePipeOpis able to internally optimize hyperparameters. This works analogously to the internal tuning implementation formlr3::Learner.PipeOps with that property also implement the standardized accessor$internal_tuned_valuesand have at least one parameter tagged with"internal_tuning". An example for such aPipeOpis aPipeOpLearnerthat wraps aLearnerwith the"internal_tuning"property.
-
Programatic access to all available properties is possible via mlr_reflections$pipeops$properties.
Methods
-
print()
() ->NULL
Prints thePipeOps most salient information:$id,$is_trained,$param_set$values,$inputand$output. -
help(help_type)
(character(1)) -> help file
Displays the help file of the concretePipeOpinstance.help_typeis one of"text","html","pdf"and behaves as thehelp_typeargument of R'shelp().
The following public $train() and $predict() methods are the primary user-facing functions intended for direct use:
-
train(input)
(list) -> namedlist
TrainPipeOponinputs, transform it to output and store the learned$state. If thePipeOpis already trained, already present$stateis overwritten. Input list is typechecked against the$inputtraincolumn. Return value is a list with as many entries as$outputhas rows, with each entry named after the$outputnamecolumn and class according to the$outputtraincolumn. The workhorse function for training eachPipeOpis theprivate$.train()function. -
predict(input)
(list) -> namedlist
Predict on new data ininput, possibly using the stored$state. Input and output are specified by$inputand$outputin the same way as for$train(), except that thepredictcolumn is used for type checking. The workhorse function for predicting in eachPipeOpis theprivate$.predict()function.
To implement a PipeOp the following abstract private functions should be overloaded in the inheriting PipeOp.
Note that these should not be called by a user; instead the public $train() and $predict() method should be used.
-
.train(input)
(namedlist) ->list
Abstract function that must be implemented by concrete subclasses.private$.train()is called by$train()after typechecking. It must change the$statevalue to something non-NULLand return a list of transformed data according to the$outputtraincolumn. Names of the returned list are ignored.
-
.predict(input)
(namedlist) ->list
Abstract function that must be implemented by concrete subclasses.private$.predict()is called by$predict()after typechecking and works analogously toprivate$.train(). Unlikeprivate$.train(),private$.predict()should not modify thePipeOpin any way.
Inheriting
To create your own PipeOp, you need to overload the private$.train() and private$.predict() functions.
It is most likely also necessary to overload the $initialize() function to do additional initialization.
The $initialize() method should have at least the arguments id and param_vals, which should be passed on to super$initialize() unchanged.
id should have a useful default value, and param_vals should have the default value list(), meaning no initialization of hyperparameters.
If the $initialize() method has more arguments, then it is necessary to also overload the private$.additional_phash_input() function.
This function should return either all objects, or a hash of all objects, that can change the function or behavior of the PipeOp and are independent
of the class, the id, the $state, and the $param_set$values. The last point is particularly important: changing the $param_set$values should
not change the return value of private$.additional_phash_input().
When you are implementing a PipeOp that operates a task (and is not a PipeOpTaskPreproc), you also need to handle the
$internal_valid_task field of the input task, if there is one.
See Also
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
# example (bogus) PipeOp that returns the sum of two numbers during $train()
# as well as a letter of the alphabet corresponding to that sum during $predict().
PipeOpSumLetter = R6::R6Class("sumletter",
inherit = PipeOp, # inherit from PipeOp
public = list(
initialize = function(id = "posum", param_vals = list()) {
super$initialize(id, param_vals = param_vals,
# declare "input" and "output" during construction here
# training takes two 'numeric' and returns a 'numeric';
# prediction takes 'NULL' and returns a 'character'.
input = data.table::data.table(name = c("input1", "input2"),
train = "numeric", predict = "NULL"),
output = data.table::data.table(name = "output",
train = "numeric", predict = "character")
)
}
),
private = list(
# PipeOp deriving classes must implement .train and
# .predict; each taking an input list and returning
# a list as output.
.train = function(input) {
sum = input[[1]] + input[[2]]
self$state = sum
list(sum)
},
.predict = function(input) {
list(letters[self$state])
}
)
)
posum = PipeOpSumLetter$new()
print(posum)
posum$train(list(1, 2))
# note the name 'output' is the name of the output channel specified
# in the $output data.table.
posum$predict(list(NULL, NULL))
Piecewise Linear Encoding Base Class
Description
Abstract base class for piecewise linear encoding.
Piecewise linear encoding works by splitting values of features into distinct bins, through an algorithm implemented
in private$.get_bins(), and then creating new feature columns through a continuous alternative to one-hot encoding.
Here, one new feature per bin is constructed, with values being either
-
0, if the original value was below the lower bin boundary, -
1, if the original value was above or equal to the upper bin boundary, or a scaled value between
0and1, if the original value was inside the bin boundaries. Scaling is done by offsetting the original value by the lower bin boundary and dividing by the bin width.
PipeOps inheriting from this encode columns of type numeric and integer. Use the PipeOpTaskPreproc
$affect_columns functionality to only encode a subset of columns, or only encode columns of a certain type, etc.
Format
Abstract R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodePL$new(id = "encodepl", param_set = ps(), param_vals = list(), packages = character(0), task_type = "Task")
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize(). -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
packages::character
Set of all required packages for thePipeOp'sprivate$.train()andprivate$.predict()methods. See$packagesslot. Default ischaracter(0). -
task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task".
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric and integer columns encoded using piecewise linear encoding.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
bins:: namedlist
Named list of numeric vectors. Each element corresponds to and is named after one of the affected feature columns and contains the bin boundaries derived throughprivate$.get_bins().
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc.
Internals
PipeOpEncodePL is an abstract class inheriting from PipeOpTaskPreprocSimple that allows easier implementation
of different binning algorithms for piecewise linear encoding. The respective binning algorithm should be implemented
as private$.get_bins().
Fields
Only fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp as well as
-
.get_bins(task, cols)
(Task,character) -> namedlist
Abstract method for splitting the value range of a feature column into distinct bins. The argumentcolsshould give the names of the feature columns of thetaskfor which bins should be derived. Returns a named list of numeric vectors containing the bin boundaries for each affected feature column, named by that corresponding feature column.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree
Ensembling Base Class
Description
Parent class for PipeOps that aggregate predictions. Implements the private$.train() and private$.predict() methods necessary
for a PipeOp and requires deriving classes to create the private$weighted_avg_predictions() function.
Format
Abstract R6Class inheriting from PipeOp.
Construction
Note: This object is typically constructed via a derived class, e.g. PipeOpClassifAvg or PipeOpRegrAvg.
PipeOpEnsemble$new(innum = 0, collect_multiplicity = FALSE, id, param_set = ps(), param_vals = list(), packages = character(0), prediction_type = "Prediction")
-
innum::numeric(1)
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. -
collect_multiplicity::logical(1)
IfTRUE, the input is aMultiplicitycollecting channel. This means, aMultiplicityinput, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnumto be 0. Default isFALSE. -
id::character(1)
Identifier of the resulting object. -
param_set::ParamSet
("Hyper"-)Parameters in form of aParamSetfor the resultingPipeOp. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist(). -
packages::character
Set of packages required for thisPipeOp. These packages are loaded during$train()and$predict(), but not attached. Defaultcharacter(0). -
prediction_type::character(1)
Thepredictentry of the$inputand$outputtype specifications. Should be"Prediction"(default) or one of its subclasses, e.g."PredictionClassif", and correspond to the type accepted byprivate$.train()andprivate$.predict().
Input and Output Channels
PipeOpEnsemble has multiple input channels depending on the innum construction argument, named "input1", "input2", ...
if innum is nonzero; if innum is 0, there is only one vararg input channel named "...".
All input channels take only NULL during training and take a Prediction during prediction.
PipeOpEnsemble has one output channel named "output", producing NULL during training and a Prediction during prediction.
The output during prediction is in some way a weighted averaged representation of the input.
State
The $state is left empty (list()).
Parameters
-
weights::numeric
Relative weights of input predictions. If this has length 1, it is ignored and weighs all inputs equally. Otherwise it must have length equal to the number of connected inputs. Initialized to 1 (equal weights).
Internals
The commonality of ensemble methods using PipeOpEnsemble is that they take a NULL-input during training and save an empty $state. They can be
used following a set of PipeOpLearner PipeOps to perform (possibly weighted) prediction averaging. See e.g.
PipeOpClassifAvg and PipeOpRegrAvg which both inherit from this class.
Should it be necessary to use the output of preceding Learners
during the "training" phase, then PipeOpEnsemble should not be used. In fact, if training time behaviour of a Learner is important, then
one should use a PipeOpLearnerCV instead of a PipeOpLearner, and the ensemble can be created with a Learner encapsulated by a PipeOpLearner.
See LearnerClassifAvg and LearnerRegrAvg for examples.
Fields
Only fields inherited from PipeOp.
Methods
Methods inherited from PipeOp as well as:
-
weighted_avg_prediction(inputs, weights, row_ids, truth)
(listofPrediction,numeric,integer|character,list) ->NULL
CreatePredictions that correspond to the weighted average of incomingPredictions. This is called byprivate$.predict()with cleaned and sanity-checked values:inputsare guaranteed to fit together,row_idsandtruthare guaranteed to be the same as each one ininputs, andweightsis guaranteed to have the same length asinputs.
This method is abstract, it must be implemented by deriving classes.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Ensembles:
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Imputation Base Class
Description
Abstract base class for feature imputation.
Format
Abstract R6Class object inheriting from PipeOp.
Construction
PipeOpImpute$$new(id, param_set = ps(), param_vals = list(), whole_task_dependent = FALSE, empty_level_control = FALSE, packages = character(0), task_type = "Task")
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize(). -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
whole_task_dependent::logical(1)
Whether thecontext_columnsparameter should be added which lets the user limit the columns that are used for imputation inference. This should generally beFALSEif imputation depends only on individual features (e.g. mode imputation), andTRUEif imputation depends on other features as well (e.g. kNN-imputation). -
empty_level_control::logical(1)
Control how to handle edge cases whereNAs occur infactorororderedfeatures only during prediction but not during training. Can be one of"never","always", or"param":If set to
"never", no empty level is introduced during training, but columns that have missing values only during prediction will not be imputed.If set to
"always", an unseen level is added to the feature during training and missing values are imputed as that value during prediction.Finally, if set to
"param", the hyperparametercreate_empty_levelis added and control over this behavior is left to the user.
For implementation details, see Internals below. Default is
"never". -
packages::character
Set of all required packages for thePipeOp'sprivate$.trainandprivate$.predictmethods. See$packagesslot. Default ischaracter(0). -
task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task". -
feature_types::character
Feature types affected by thePipeOp. Seeprivate$.select_cols()for more information.
Input and Output Channels
PipeOpImpute has one input channel named "input", taking a Task, or a subclass of
Task if the task_type construction argument is given as such; both during training and prediction.
PipeOpImpute has one output channel named "output", producing a Task, or a subclass;
the Task type is the same as for input; both during training and prediction.
The output Task is the modified input Task with features imputed according to the private$.impute() function.
State
The $state is a named list; besides members added by inheriting classes, the members are:
-
affected_cols::character
Names of features being selected by theaffect_columnsparameter. -
context_cols::character
Names of features being selected by thecontext_columnsparameter. -
intasklayout::data.table
Copy of the trainingTask's$feature_typesslot. This is used during prediction to ensure that the predictionTaskhas the same features, feature layout, and feature types as during training. -
outtasklayout::data.table
Copy of the trainedTask's$feature_typesslot. This is used during prediction to ensure that theTaskresulting from the prediction operation has the same features, feature layout, and feature types as after training. -
model:: namedlist
Model used for imputation. This is a list named byTaskfeatures, containing the result of theprivate$.train_imputer()orprivate$.train_nullmodel()function for each one. -
imputed_train::character
Names of features that were imputed during training. This is used to ensure that factor levels that were added during training are also added during prediction. Note that features that are imputed during prediction but not during training will still have inconsistent factor levels.
Parameters
-
affect_columns::function|Selector|NULL
What columns thePipeOpImputeshould operate on. The parameter must be aSelectorfunction, which takes aTaskas argument and returns acharacterof features to use.
SeeSelectorfor example functions. Defaults toNULL, which selects all features. -
context_columns::function|Selector|NULL
What columns thePipeOpImputeimputation may depend on. This parameter is only present if the constructor is called with thewhole_task_dependentargument set toTRUE.
The parameter must be aSelectorfunction, which takes aTaskas argument and returns acharacterof features to use.
SeeSelectorfor example functions. Defaults toNULL, which selects all features. -
create_empty_level::logical(1)
Whether an empty level should always be created forfactorororderedcolumns during training. IfFALSE, columns that had noNAs during training but haveNAs during prediction will not be imputed. This parameter is only present if the constructor is called with theempty_level_controlargument set to"param". Initialized toFALSE.
Internals
PipeOpImpute is an abstract class inheriting from PipeOp that makes implementing imputer PipeOps simple.
Internally, the construction argument empty_level_control and the hyperparameter create_empty_level (should it
exist) modify the private$.create_empty_level field. Behavior then depends on whether this field is set to TRUE
or FALSE and works by controlling for which cases imputation is performed on factor or ordered columns. Its
setting has no impact on columns of other types.
If private$.create_empty_level is set to TRUE, private$.impute() is called for all factor or ordered
columns during training, regardless of whether they have any missing values. For this to lead to the creation of an
empty level for columns with no missing values, inheriting PipeOps must implement private$.train_imputer() in
such a way that it returns the name of the level to be created for the feature types factor and ordered.
If private$.create_empty_level is set to FALSE, private$.impute() is not called during prediction for factor
or ordered columns which were not modified during training. This means that NAs will not be imputed for these
columns.
See PipeOpImputeOOR, for a detailed explanation of why these controls are necessary.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOp, as well as:
-
.select_cols(task)
(Task) ->character
Selects which columns thePipeOpoperates on. In contrast to theaffect_columnsparameter.private$.select_cols()is for the inheriting class to determine which columns the operator should function on, e.g. based on feature type, whileaffect_columnsis a way for the user to limit the columns that aPipeOpTaskPreprocshould operate on. This method can optionally be overloaded when inheritingPipeOpImpute; If this method is not overloaded, it defaults to selecting the columns of type indicated by thefeature_typesconstruction argument. -
.train_imputer(feature, type, context)
(atomic,character(1),data.table) ->any
Abstract function that must be overloaded when inheriting. Called once for each feature selected byaffect_columnsto create the model entry to be used forprivate$.impute(). This function is only called for features with at least one non-missing value. -
.train_nullmodel(feature, type, context)
(atomic,character(1),data.table) ->any
Like.train_imputer(), but only called for each feature that only contains missing values. This is not an abstract function and, if not overloaded, gives a default response of0(integer,numeric),c(TRUE, FALSE)(logical), all available levels (factor/ordered), or the empty string (character). -
.impute(feature, type, model, context)
(atomic,character(1),any,data.table) ->atomic
Imputes the features.modelis the model created byprivate$.train_imputer(). Default behaviour is to assumemodelis an atomic vector from which values are sampled to impute missing values offeature.modelmay have an attributeprobabilitiesfor non-uniform sampling. Ifmodelhas length zero,featureis returned unchanged.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Target Transformation Base Class
Description
Base class for handling target transformation operations. Target transformations are different
from feature transformation because they have to be "inverted" after prediction. The
target is transformed during the training phase and information to invert this transformation
is sent along to PipeOpTargetInvert which then inverts this transformation during the
prediction phase. This inversion may need info about both the training and the prediction data.
Users can overload up to four private$-functions: .get_state() (optional), .transform() (mandatory),
.train_invert() (optional), and .invert() (mandatory).
Format
Abstract R6Class inheriting from PipeOp.
Construction
PipeOpTargetTrafo$new(id, param_set = ps(), param_vals = list(), packages = character(0), task_type_in = "Task", task_type_out = task_type_in, tags = NULL)
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize(). -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
task_type_in::character(1)
The class ofTaskthat should be accepted as input. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task". -
task_type_out::character(1)
The class ofTaskthat is produced as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is the value oftask_type_in. -
packages::character
Set of all required packages for thePipeOp's methods. See$packagesslot. Default ischaracter(0). -
tags::character|NULL
Tags of the resultingPipeOp. This is added to the tag"target transform". DefaultNULL.
Input and Output Channels
PipeOpTargetTrafo has one input channels named "input" taking a Task (or whatever class
was specified by the task_type during construction) both during training and prediction.
PipeOpTargetTrafo has two output channels named "fun" and "output". During training,
"fun" returns NULL and during prediction, "fun" returns a function that can later be used
to invert the transformation done during training according to the overloaded .train_invert()
and .invert() functions. "output" returns the modified input Task (or task_type)
according to the overloaded transform() function both during training and prediction.
State
The $state is a named list and should be returned explicitly by the user in the overloaded
.get_state() function.
Internals
PipeOpTargetTrafo is an abstract class inheriting from PipeOp. It implements the
private$.train() and private$.predict() functions. These functions perform checks and go on
to call .get_state(), .transform(), .train_invert(). .invert() is packaged and sent along
the "fun" output to be applied to a Prediction by PipeOpTargetInvert.
A subclass of PipeOpTargetTrafo should implement these functions and be used in combination
with PipeOpTargetInvert.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOp, as well as:
-
.get_state(task)
(Task) ->list
Called byPipeOpTargetTrafo's implementation ofprivate$.train(). Takes a singleTaskas input and returns alistto set the$state..get_state()will be called a single time during training right before.transform()is called. The return value (i.e. the$state) should contain info needed in.transform()as well as in.invert().
The base implementation returnslist()and should be overloaded if setting the state is desired. -
.transform(task, phase)
(Task,character(1)) ->Task
Called byPipeOpTargetTrafo's implementation ofprivate$.train()andprivate$.predict(). Takes a singleTaskas input and modifies it. This should typically consist of calculating a new target and modifying theTaskby using theconvert_taskfunction..transform()will be called during training and prediction because the target (and if needed also type) of the inputTaskmust be transformed both times. Note that unlike$.train(), the argument is not a list but a singularTask, and the return object is also not a list but a singularTask. Thephaseargument is"train"during training phase and"predict"during prediction phase and can be used to enable different behaviour during training and prediction. Whenphaseis"train", the$stateslot (as previously set by.get_state()) may also be modified, alternatively or in addition to overloading.get_state().
The input should not be cloned and if possible should be changed in-place.
This function is abstract and should be overloaded by inheriting classes. -
.train_invert(task)
(Task) ->any
Called byPipeOpTargetTrafo's implementation ofprivate$.predict(). Takes a singleTaskas input and returns an arbitrary value that will be given aspredict_phase_stateto.invert(). This should not modify the inputTask.
The base implementation returns a list with a single element, the$truthcolumn of theTask, and should be overloaded if a more training-phase-dependent state is desired. -
.invert(prediction, predict_phase_state)
(Prediction,any) ->Prediction
Takes aPredictionand apredict_phase_stateobject as input and inverts the prediction. This function is sent as"fun"toPipeOpTargetInvert.
This function is abstract and should be overloaded by inheriting classes. Care should be taken that thepredict_typeof thePredictionbeing inverted is handled well. -
.invert_help(predict_phase_state)
(predict_phase_stateobject) ->function
Helper function that packages.invert()that can later be used for the inversion.
See Also
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Task Preprocessing Base Class
Description
Base class for handling most "preprocessing" operations. These
are operations that have exactly one Task input and one Task output,
and expect the column layout of these Tasks during input and output
to be the same.
Prediction-behavior of preprocessing operations should always be independent for each row in the input-Task.
This means that the prediction-operation of preprocessing-PipeOps should commute with rbind(): Running prediction
on an n-row Task should result in the same result as rbind()-ing the prediction-result from n
1-row Tasks with the same content. In the large majority of cases, the number and order of rows
should also not be changed during prediction.
Users must implement private$.train_task() and private$.predict_task(), which have a Task
input and should return that Task. The Task should, if possible, be
manipulated in-place, and should not be cloned.
Alternatively, the private$.train_dt() and private$.predict_dt() functions can be implemented, which operate on
data.table objects instead. This should generally only be done if all
data is in some way altered (e.g. PCA changing all columns to principal components) and not if only
a few columns are added or removed (e.g. feature selection) because this should be done at the Task-level
with private$.train_task(). The private$.select_cols() function can be overloaded for private$.train_dt() and private$.predict_dt()
to operate only on subsets of the Task's data, e.g. only on numerical columns.
If the can_subset_cols argument of the constructor is TRUE (the default), then the hyperparameter affect_columns
is added, which can limit the columns of the Task that is modified by the PipeOpTaskPreproc
using a Selector function. Note this functionality is entirely independent of the private$.select_cols() functionality.
PipeOpTaskPreproc is useful for operations that behave differently during training and prediction. For operations
that perform essentially the same operation and only need to perform extra work to build a $state during training,
the PipeOpTaskPreprocSimple class can be used instead.
Format
Abstract R6Class inheriting from PipeOp.
Construction
PipeOpTaskPreproc$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize(). -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
can_subset_cols::logical(1)
Whether theaffect_columnsparameter should be added which lets the user limit the columns that are modified by thePipeOpTaskPreproc. This should generally beFALSEif the operation adds or removes rows from theTask, andTRUEotherwise. Default isTRUE. -
packages::character
Set of all required packages for thePipeOp'sprivate$.train()andprivate$.predict()methods. See$packagesslot. Default ischaracter(0). -
task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task". -
tags::character|NULL
Tags of the resultingPipeOp. This is added to the tag"data transform". DefaultNULL. -
feature_types::character
Feature types affected by thePipeOp. Seeprivate$.select_cols()for more information. Defaults to all available feature types.
Input and Output Channels
PipeOpTaskPreproc has one input channel named "input", taking a Task, or a subclass of
Task if the task_type construction argument is given as such; both during training and prediction.
PipeOpTaskPreproc has one output channel named "output", producing a Task, or a subclass;
the Task type is the same as for input; both during training and prediction.
The output Task is the modified input Task according to the overloaded
private$.train_task()/private$.predict_taks() or private$.train_dt()/private$.predict_dt() functions.
State
The $state is a named list; besides members added by inheriting classes, the members are:
-
affect_cols::character
Names of features being selected by theaffect_columnsparameter, if present; names of all present features otherwise. -
intasklayout::data.table
Copy of the trainingTask's$feature_typesslot. This is used during prediction to ensure that the predictionTaskhas the same features, feature layout, and feature types as during training. -
outtasklayout::data.table
Copy of the trainedTask's$feature_typesslot. This is used during prediction to ensure that theTaskresulting from the prediction operation has the same features, feature layout, and feature types as after training. -
dt_columns::character
Names of features selected by theprivate$.select_cols()call during training. This is only present if theprivate$.train_dt()functionality is used, and not present if theprivate$.train_task()function is overloaded instead. -
feature_types::character
Feature types affected by thePipeOp. Seeprivate$.select_cols()for more information.
Parameters
-
affect_columns::function|Selector|NULL
What columns thePipeOpTaskPreprocshould operate on. This parameter is only present if the constructor is called with thecan_subset_colsargument set toTRUE(the default).
The parameter must be aSelectorfunction, which takes aTaskas argument and returns acharacterof features to use.
SeeSelectorfor example functions. Defaults toNULL, which selects all features.
Internals
PipeOpTaskPreproc is an abstract class inheriting from PipeOp. It implements the private$.train() and
$.predict() functions. These functions perform checks and go on to call private$.train_task() and private$.predict_task().
A subclass of PipeOpTaskPreproc may implement these functions, or implement private$.train_dt() and private$.predict_dt() instead.
This works by having the default implementations of private$.train_task() and private$.predict_task() call private$.train_dt() and private$.predict_dt(),
respectively.
The affect_columns functionality works by unsetting columns by removing their "col_role" before
processing, and adding them afterwards by setting the col_role to "feature".
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOp, as well as:
-
.train_task(task)
(Task) ->Task
Called by thePipeOpTaskPreproc's implementation ofprivate$.train(). Takes a singleTaskas input and modifies it (ideally in-place without cloning) while storing information in the$stateslot. Note that unlike$.train(), the argument is not a list but a singularTask, and the return object is also not a list but a singularTask. Also, contrary toprivate$.train(), the$statebeing generated must be alist, which thePipeOpTaskPreprocwill add additional slots to (see Section State). Care should be taken to avoid name collisions between$stateelements added byprivate$.train_task()andPipeOpTaskPreproc.
By default this function calls theprivate$.train_dt()function, but it can be overloaded to perform operations on theTaskdirectly. -
.predict_task(task)
(Task) ->Task
Called by thePipeOpTaskPreproc's implementation of$.predict(). Takes a singleTaskas input and modifies it (ideally in-place without cloning) while using information in the$stateslot. Works analogously toprivate$.train_task(). Ifprivate$.predict_task()should only be overloaded ifprivate$.train_task()is overloaded (i.e.private$.train_dt()is not used). -
.train_dt(dt, levels, target)
(data.table, namedlist,any) ->data.table|data.frame|matrix
TrainPipeOpTaskPreprocondt, transform it and store a state in$state. A transformed object must be returned that can be converted to adata.tableusingas.data.table.dtdoes not need to be copied deliberately, it is possible and encouraged to change it in-place.
Thelevelsargument is a named list of factor levels for factorial or character features. If the inputTaskinherits fromTaskSupervised, thetargetargument contains the$truth()information of the trainingTask; its type depends on theTasktype being trained on.
This method can be overloaded when inheriting fromPipeOpTaskPreproc, together withprivate$.predict_dt()and optionallyprivate$.select_cols(); alternatively,private$.train_task()andprivate$.predict_task()can be overloaded. -
.predict_dt(dt, levels)
(data.table, namedlist) ->data.table|data.frame|matrix
Predict on new data indt, possibly using the stored$state. A transformed object must be returned that can be converted to adata.tableusingas.data.table.dtdoes not need to be copied deliberately, it is possible and encouraged to change it in-place.
Thelevelsargument is a named list of factor levels for factorial or character features.
This method can be overloaded when inheritingPipeOpTaskPreproc, together withprivate$.train_dt()and optionallyprivate$.select_cols(); alternatively,private$.train_task()andprivate$.predict_task()can be overloaded. -
.select_cols(task)
(Task) ->character
Selects which columns thePipeOpoperates on, ifprivate$.train_dt()andprivate$.predict_dt()are overloaded. This function is not called ifprivate$.train_task()andprivate$.predict_task()are overloaded. In contrast to theaffect_columnsparameter.private$.select_cols()is for the inheriting class to determine which columns the operator should function on, e.g. based on feature type, whileaffect_columnsis a way for the user to limit the columns that aPipeOpTaskPreprocshould operate on.
This method can optionally be overloaded when inheritingPipeOpTaskPreproc, together withprivate$.train_dt()andprivate$.predict_dt(); alternatively,private$.train_task()andprivate$.predict_task()can be overloaded.
If this method is not overloaded, it defaults to selecting of type indicated by thefeature_typesconstruction argument.
See Also
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Simple Task Preprocessing Base Class
Description
Base class for handling many "preprocessing" operations
that perform essentially the same operation during training and prediction.
Instead implementing a private$.train_task() and a private$.predict_task() operation, only
a private$.get_state() and a private$.transform() operation needs to be defined,
both of which take one argument: a Task.
Alternatively, analogously to the PipeOpTaskPreproc approach of offering private$.train_dt()/private$.predict_dt(),
the private$.get_state_dt() and private$.transform_dt() functions may be implemented.
private$.get_state must not change its input value in-place and must return
something that will be written into $state
(which must not be NULL), private$.transform() should modify its argument in-place;
it is called both during training and prediction.
This inherits from PipeOpTaskPreproc and behaves essentially the same.
Format
Abstract R6Class inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpTaskPreprocSimple$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)
(Construction is identical to PipeOpTaskPreproc.)
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize(). -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist(). -
can_subset_cols::logical(1)
Whether theaffect_columnsparameter should be added which lets the user limit the columns that are modified by thePipeOpTaskPreprocSimple. This should generally beFALSEif the operation adds or removes rows from theTask, andTRUEotherwise. Default isTRUE. -
packages::character
Set of all required packages for thePipeOp'sprivate$.train()andprivate$.predict()methods. See$packagesslot. Default ischaracter(0). -
task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task". -
tags::character|NULL
Tags of the resultingPipeOp. This is added to the tag"data transform". DefaultNULL. -
feature_types::character
Feature types affected by thePipeOp. Seeprivate$.select_cols()for more information. Defaults to all available feature types.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output during training and prediction is the Task, modified by private$.transform() or private$.transform_dt().
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc.
Internals
PipeOpTaskPreprocSimple is an abstract class inheriting from PipeOpTaskPreproc and implementing the
private$.train_task() and private$.predict_task() functions. A subclass of PipeOpTaskPreprocSimple may implement the
functions private$.get_state() and private$.transform(), or alternatively the functions private$.get_state_dt() and private$.transform_dt()
(as well as private$.select_cols(), in the latter case). This works by having the default implementations of
private$.get_state() and private$.transform() call private$.get_state_dt() and private$.transform_dt().
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreproc, as well as:
-
.get_state(task)
(Task) -> namedlist
Store create something that will be stored in$stateduring training phase ofPipeOpTaskPreprocSimple. The state can then influence theprivate$.transform()function. Note thatprivate$.get_state()must return the state, and should not store it in$state. It is not strictly necessary to implement eitherprivate$.get_state()orprivate$.get_state_dt(); if they are not implemented, the state will be stored aslist().
This method can optionally be overloaded when inheriting fromPipeOpTaskPreprocSimple, together withprivate$.transform(); alternatively,private$.get_state_dt()(optional) andprivate$.transform_dt()(and possiblyprivate$.select_cols(), fromPipeOpTaskPreproc) can be overloaded. -
.transform(task)
(Task) ->Task
Predict on new data intask, possibly using the stored$state.taskshould not be cloned, instead it should be changed in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit fromPipeOpTaskPreproc, not fromPipeOpTaskPreprocSimple.)
This method can be overloaded when inheriting fromPipeOpTaskPreprocSimple, optionally withprivate$.get_state(); alternatively,private$.get_state_dt()(optional) andprivate$.transform_dt()(and possiblyprivate$.select_cols(), fromPipeOpTaskPreproc) can be overloaded. -
.get_state_dt(dt)
(data.table) -> namedlist
Create something that will be stored in$stateduring training phase ofPipeOpTaskPreprocSimple. The state can then influence theprivate$.transform_dt()function. Note thatprivate$.get_state_dt()must return the state, and should not store it in$state. If neitherprivate$.get_state()norprivate$.get_state_dt()are overloaded, the state will be stored aslist().
This method can optionally be overloaded when inheriting fromPipeOpTaskPreprocSimple, together withprivate$.transform_dt()(and optionallyprivate$.select_cols(), fromPipeOpTaskPreproc); Alternatively,private$.get_state()(optional) andprivate$.transform()can be overloaded. -
.transform_dt(dt)
(data.table) ->data.table|data.frame|matrix
Predict on new data indt, possibly using the stored$state. A transformed object must be returned that can be converted to adata.tableusingas.data.table.dtdoes not need to be copied deliberately, it is possible and encouraged to change it in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit fromPipeOpTaskPreproc, not fromPipeOpTaskPreprocSimple.)
This method can optionally be overloaded when inheriting fromPipeOpTaskPreprocSimple, together withprivate$.transform_dt()(and optionallyprivate$.select_cols(), fromPipeOpTaskPreproc); Alternatively,private$.get_state()(optional) andprivate$.transform()can be overloaded.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Selector Functions
Description
A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting
from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.
Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors
shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect
selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.
Usage
selector_all()
selector_none()
selector_type(types)
selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE)
selector_name(feature_names, assert_present = FALSE)
selector_invert(selector)
selector_intersect(selector_x, selector_y)
selector_union(selector_x, selector_y)
selector_setdiff(selector_x, selector_y)
selector_missing()
selector_cardinality_greater_than(min_cardinality)
Arguments
types |
( |
pattern |
( |
ignore.case |
( |
perl |
( |
fixed |
( |
feature_names |
( |
assert_present |
( |
selector |
|
selector_x |
|
selector_y |
|
min_cardinality |
( |
Value
function: A Selector function that takes a Task and returns the feature names to be processed.
Functions
-
selector_all():selector_allselects all features. -
selector_none():selector_noneselects none of the features. -
selector_type():selector_typeselects features according to type. Legal types are listed inmlr_reflections$task_feature_types. -
selector_grep():selector_grepselects features with names matching thegrep()pattern. -
selector_name():selector_nameselects features with names matching exactly the names listed. -
selector_invert():selector_invertinverts a givenSelector: It always selects the features that would be dropped by the otherSelector, and drops the features that would be kept. -
selector_intersect():selector_intersectselects the intersection of twoSelectors: Only features selected by bothSelectors are selected in the end. -
selector_union():selector_unionselects the union of twoSelectors: Features selected by eitherSelectorare selected in the end. -
selector_setdiff():selector_setdiffselects the setdiff of twoSelectors: Features selected byselector_xare selected, unless they are also selected byselector_y. -
selector_missing():selector_missingselects features with missing values. -
selector_cardinality_greater_than():selector_cardinality_greater_thanselects categorical features with cardinality greater then a given threshold.
Details
A Selector is a function
that has one input argument (commonly named task). The function is called with the Task that a PipeOp
is operating on. The return value of the function must be a character vector that is a subset of the feature names present
in the Task.
For example, a Selector that selects all columns is
function(task) {
task$feature_names
}
(this is the selector_all()-Selector.) A Selector that selects
all columns that have names shorter than four letters would be:
function(task) {
task$feature_names[
nchar(task$feature_names) < 4
]
}
A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is
function(task) {
intersect(task$feature_names, "Sepal.Length")
}
It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.
See Also
Other Selectors:
mlr_pipeops_select
Examples
library("mlr3")
iris_task = tsk("iris")
bh_task = tsk("boston_housing")
sela = selector_all()
sela(iris_task)
sela(bh_task)
self = selector_type("factor")
self(iris_task)
self(bh_task)
selg = selector_grep("a.*i")
selg(iris_task)
selg(bh_task)
selgi = selector_invert(selg)
selgi(iris_task)
selgi(bh_task)
selgf = selector_union(selg, self)
selgf(iris_task)
selgf(bh_task)
Add a Class Hierarchy to the Cache
Description
Add a class hierarchy to the class hierarchy cache. This is necessary whenever an S3 class's class hierarchy is important when inferring compatibility between types.
Usage
add_class_hierarchy_cache(hierarchy)
Arguments
hierarchy |
|
Value
NULL
See Also
Other class hierarchy operations:
register_autoconvert_function(),
reset_autoconvert_register(),
reset_class_hierarchy_cache()
Examples
# This lets mlr3pipelines handle "data.table" as "data.frame".
# This is an example and not necessary, because mlr3pipelines adds it by default.
add_class_hierarchy_cache(c("data.table", "data.frame"))
Convert an object to a Multiplicity
Description
Convert an object to a Multiplicity.
Usage
as.Multiplicity(x)
Arguments
x |
( |
Value
Conversion to mlr3pipelines Graph
Description
The argument is turned into a Graph if possible.
If clone is TRUE, a deep copy is made
if the incoming object is a Graph to ensure the resulting
object is a different reference from the incoming object.
as_graph() is an S3 method and can therefore be implemented
by other packages that may add objects that can naturally be converted to Graphs.
By default, as_graph() tries to
apply
gunion()toxif it is alist, which recursively appliesas_graph()to all list elements firstcreate a
Graphwith only one element ifxis aPipeOpor can be converted to one usingas_pipeop().
Usage
as_graph(x, clone = FALSE)
Arguments
x |
( |
clone |
( |
Value
Graph x or a deep clone of it.
See Also
Other Graph operators:
%>>%(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Conversion to mlr3pipelines PipeOp
Description
The argument is turned into a PipeOp
if possible.
If clone is TRUE, a deep copy is made
if the incoming object is a PipeOp to ensure the resulting
object is a different reference from the incoming object.
as_pipeop() is an S3 method and can therefore be implemented by other packages
that may add objects that can naturally be converted to PipeOps. Objects that
can be converted are for example Learner (using PipeOpLearner) or
Filter (using PipeOpFilter).
Usage
as_pipeop(x, clone = FALSE)
Arguments
x |
( |
clone |
( |
Value
PipeOp x or a deep clone of it.
See Also
Other Graph operators:
%>>%(),
as_graph(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Assertion for mlr3pipelines Graph
Description
Function that checks that a given object is a Graph and
throws an error if not.
Usage
assert_graph(x)
Arguments
x |
( |
Value
Graph invisible(x)
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Assertion for mlr3pipelines PipeOp
Description
Function that checks that a given object is a PipeOp and
throws an error if not.
Usage
assert_pipeop(x)
Arguments
x |
( |
Value
PipeOp invisible(x)
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Chain a Series of Graphs
Description
Takes an arbitrary amount of Graphs or PipeOps (or objects that can be automatically
converted into Graphs or PipeOps, see as_graph() and as_pipeop()) as inputs and joins
them in a serial Graph, as if connecting them using %>>%.
Care is taken to avoid unnecessarily cloning of components. A call of
chain_graphs(list(g1, g2, g3, g4, ...), in_place = FALSE) is equivalent to
g1 %>>% g2 %>>!% g3 %>>!% g4 %>>!% ....
A call of chain_graphs(list(g1, g2, g3, g4, ...), in_place = FALSE)
is equivalent to g1 %>>!% g2 %>>!% g3 %>>!% g4 %>>!% ... (differing in the
first operator being %>>!% as well).
Usage
chain_graphs(graphs, in_place = FALSE)
Arguments
graphs |
|
in_place |
( |
Value
Graph the resulting Graph, or NULL if there are no non-null values in graphs.
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Remove NO_OPs from a List
Description
Remove all NO_OP elements from a list.
Usage
filter_noop(x)
Arguments
x |
|
Value
list: The input list, with all NO_OP elements removed.
See Also
Other Path Branching:
NO_OP,
is_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
Create Disjoint Graph Union of Copies of a Graph
Description
Create a new Graph containing n copies of the input Graph / PipeOp.
To avoid ID collisions, PipeOp IDs are suffixed with _i
where i ranges from 1 to n.
This function is deprecated and will be removed in the next version in favor of using pipeline_greplicate / ppl("greplicate").
Usage
greplicate(graph, n)
Arguments
graph |
|
n |
|
Value
Graph containing n copies of input graph.
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
gunion(),
mlr_graphs_greplicate
Disjoint Union of Graphs
Description
Takes an arbitrary amount of Graphs or PipeOps (or objects that can be automatically
converted into Graphs or PipeOps, see as_graph() and as_pipeop()) as inputs and joins
them in a new Graph.
The PipeOps of the input Graphs are not joined with new edges across
Graphs, so if length(graphs) > 1, the resulting Graph will be disconnected.
This operation always creates deep copies of its input arguments, so they cannot be modified by reference afterwards.
To access individual PipeOps after composition, use the resulting Graph's $pipeops list.
Usage
gunion(graphs, in_place = FALSE)
Arguments
graphs |
|
in_place |
( |
Value
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
mlr_graphs_greplicate
Check if an object is a Multiplicity
Description
Check if an object is a Multiplicity.
Usage
is.Multiplicity(x)
Arguments
x |
( |
Value
logical(1)
Test for NO_OP
Description
Test whether a given object is a NO_OP.
Usage
is_noop(x)
Arguments
x |
|
Value
logical(1): Whether x is a NO_OP.
See Also
Other Path Branching:
NO_OP,
filter_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
Dictionary of (sub-)graphs
Description
A simple Dictionary storing objects of class Graph.
The dictionary contains a collection of often-used graph structures, and it's aim
is solely to make often-used functions more accessible.
Each Graph has an associated help page, which can be accessed via ?mlr_graphs_<key>, i.e.
?mlr_graphs_bagging.
Format
R6Class object inheriting from mlr3misc::Dictionary.
Methods
Methods inherited from Dictionary, as well as:
-
add(key, value)
(character(1),function)
Adds constructorvalueto the dictionary with keykey, potentially overwriting a previously stored item.
S3 methods
-
as.data.table(dict)
Dictionary->data.table::data.table
Returns adata.tablewith columnkey(character).
See Also
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_updatetarget
Other Dictionaries:
mlr_pipeops
Examples
library(mlr3)
lrn = lrn("regr.rpart")
task = mlr_tasks$get("boston_housing")
# Robustify the learner for the task.
gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn)
# or equivalently
gr = mlr_graphs$get("robustify", task = task, learner = lrn) %>>% po(lrn)
# or equivalently
gr = ppl("robustify", task, lrn) %>>% po("learner", lrn)
# all Graphs currently in the dictionary:
as.data.table(mlr_graphs)
Create a bagging learner
Description
Creates a Graph that performs bagging for a supplied graph.
This is done as follows:
-
Subsamplethe data in each step usingPipeOpSubsample, afterwards applygraph Replicate this step
iterationstimes (in parallel via multiplicities)Average outputs of replicated
graphs predictions using theaverager(note that settingcollect_multipliciy = TRUEis required)
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_bagging(
graph,
iterations = 10,
frac = 0.7,
averager = NULL,
replace = FALSE
)
Arguments
graph |
|
iterations |
|
frac |
|
averager |
|
replace |
|
Value
Examples
library(mlr3)
lrn_po = po("learner", lrn("regr.rpart"))
task = mlr_tasks$get("boston_housing")
gr = pipeline_bagging(lrn_po, 3, averager = po("regravg", collect_multiplicity = TRUE))
resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()
# The original bagging method uses boosting by sampling with replacement.
gr = ppl("bagging", lrn_po, frac = 1, replace = TRUE,
averager = po("regravg", collect_multiplicity = TRUE))
resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()
Branch Between Alternative Paths
Description
Create a multiplexed graph.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_branch(graphs, prefix_branchops = "", prefix_paths = FALSE)
Arguments
graphs |
|
prefix_branchops |
|
prefix_paths |
|
Value
Examples
library("mlr3")
po_pca = po("pca")
po_nop = po("nop")
branches = pipeline_branch(list(pca = po_pca, nothing = po_nop))
# gives the same as
branches = c("pca", "nothing")
po("branch", branches) %>>%
gunion(list(po_pca, po_nop)) %>>%
po("unbranch", branches)
pipeline_branch(list(pca = po_pca, nothing = po_nop),
prefix_branchops = "br_", prefix_paths = "xy_")
# gives the same as
po("branch", branches, id = "br_branch") %>>%
gunion(list(xy_pca = po_pca, xy_nothing = po_nop)) %>>%
po("unbranch", branches, id = "br_unbranch")
Convert Column Types
Description
Converts all columns of type type_from to type_to, using the corresponding R function (e.g. as.numeric(), as.factor()).
It is possible to further subset the columns that should be affected using the affect_columns argument.
The resulting Graph contains a PipeOpColApply, followed, if appropriate, by a PipeOpFixFactors.
Unlike R's as.factor() function, ppl("convert_types") will convert ordered types into (unordered) factor vectors.
Usage
pipeline_convert_types(
type_from,
type_to,
affect_columns = NULL,
id = NULL,
fixfactors = NULL,
more_args = list()
)
Arguments
type_from |
|
type_to |
|
affect_columns |
|
id |
|
fixfactors |
|
more_args |
|
Value
Examples
library("mlr3")
data_chr = data.table::data.table(
x = factor(letters[1:3]),
y = letters[1:3],
z = letters[1:3]
)
task_chr = TaskClassif$new("task_chr", data_chr, "x")
str(task_chr$data())
graph = ppl("convert_types", "character", "factor")
str(graph$train(task_chr)[[1]]$data())
graph_z = ppl("convert_types", "character", "factor",
affect_columns = selector_name("z"))
graph_z$train(task_chr)[[1]]$data()
# `affect_columns` and `type_from` are both applied. The following
# looks for a 'numeric' column with name 'z', which is not present;
# the task is therefore unchanged.
graph_z = ppl("convert_types", "numeric", "factor",
affect_columns = selector_name("z"))
graph_z$train(task_chr)[[1]]$data()
Create Disjoint Graph Union of Copies of a Graph
Description
Create a new Graph containing n copies of the input Graph / PipeOp. To avoid ID
collisions, PipeOp IDs are suffixed with _i where i ranges from 1 to n.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_greplicate(graph, n)
Arguments
graph |
|
n |
|
Value
Graph containing n copies of input graph.
See Also
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion()
Examples
library("mlr3")
po_pca = po("pca")
pipeline_greplicate(po_pca, n = 2)
Create A Graph to Perform "One vs. Rest" classification.
Description
Create a new Graph for a classification Task to
perform "One vs. Rest" classification.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_ovr(graph)
Arguments
graph |
|
Value
Examples
library("mlr3")
task = tsk("wine")
learner = lrn("classif.rpart")
learner$predict_type = "prob"
# Simple OVR
g1 = pipeline_ovr(learner)
g1$train(task)
g1$predict(task)
# Bagged Learners
gr = po("replicate", reps = 3) %>>%
po("subsample") %>>%
learner %>>%
po("classifavg", collect_multiplicity = TRUE)
g2 = pipeline_ovr(gr)
g2$train(task)
g2$predict(task)
# Bagging outside OVR
g3 = po("replicate", reps = 3) %>>%
pipeline_ovr(po("subsample") %>>% learner) %>>%
po("classifavg", collect_multiplicity = TRUE)
g3$train(task)
g3$predict(task)
Robustify a learner
Description
Creates a Graph that can be used to robustify any subsequent learner.
Performs the following steps:
Drops empty factor levels using
PipeOpFixFactorsImputes
numericfeatures usingPipeOpImputeHistandPipeOpMissIndImputes
factorfeatures usingPipeOpImputeOOREncodes
factorsusingone-hot-encoding. Factors with a cardinality > max_cardinality are collapsed usingPipeOpCollapseFactors
The graph is built conservatively, i.e. the function always tries to assure everything works. If a learner is provided, some steps can be left out, i.e. if the learner can deal with factor variables, no encoding is performed.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_robustify(
task = NULL,
learner = NULL,
impute_missings = NULL,
factors_to_numeric = NULL,
max_cardinality = 1000,
ordered_action = "factor",
character_action = "factor",
POSIXct_action = "numeric"
)
Arguments
task |
|
learner |
|
impute_missings |
|
factors_to_numeric |
|
max_cardinality |
|
ordered_action |
|
character_action |
|
POSIXct_action |
|
Value
Examples
library(mlr3)
lrn = lrn("regr.rpart")
task = mlr_tasks$get("boston_housing")
gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn)
resample(task, GraphLearner$new(gr), rsmp("holdout"))
Create A Graph to Perform Stacking.
Description
Create a new Graph for stacking. A stacked learner uses predictions of
several base learners and fits a super learner using these predictions as
features in order to predict the outcome.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_stacking(
base_learners,
super_learner,
method = "cv",
folds = 3,
use_features = TRUE
)
Arguments
base_learners |
|
super_learner |
|
method |
|
folds |
|
use_features |
|
Value
Examples
library(mlr3)
library(mlr3learners)
base_learners = list(
lrn("classif.rpart", predict_type = "prob"),
lrn("classif.nnet", predict_type = "prob")
)
super_learner = lrn("classif.log_reg")
graph_stack = pipeline_stacking(base_learners, super_learner)
graph_learner = as_learner(graph_stack)
graph_learner$train(tsk("german_credit"))
Transform and Re-Transform the Target Variable
Description
Wraps a Graph that transforms a target during training and inverts the transformation
during prediction. This is done as follows:
Specify a transformation and inversion function using any subclass of
PipeOpTargetTrafo, defaults toPipeOpTargetMutate, afterwards applygraph.At the very end, during prediction the transformation is inverted using
PipeOpTargetInvert.To set a transformation and inversion function for
PipeOpTargetMutatesee the parameterstrafoandinverterof theparam_setof the resultingGraph.Note that the input
graphis not explicitly checked to actually return aPredictionduring prediction.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_targettrafo(
graph,
trafo_pipeop = PipeOpTargetMutate$new(),
id_prefix = ""
)
Arguments
graph |
|
trafo_pipeop |
|
id_prefix |
|
Value
Examples
library("mlr3")
tt = pipeline_targettrafo(PipeOpLearner$new(LearnerRegrRpart$new()))
tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2)
tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)
# gives the same as
g = Graph$new()
g$add_pipeop(PipeOpTargetMutate$new(param_vals = list(
trafo = function(x) log(x, base = 2),
inverter = function(x) list(response = 2 ^ x$response))
)
)
g$add_pipeop(LearnerRegrRpart$new())
g$add_pipeop(PipeOpTargetInvert$new())
g$add_edge(src_id = "targetmutate", dst_id = "targetinvert",
src_channel = 1, dst_channel = 1)
g$add_edge(src_id = "targetmutate", dst_id = "regr.rpart",
src_channel = 2, dst_channel = 1)
g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert",
src_channel = 1, dst_channel = 2)
Optimized Weighted Average of Features for Classification and Regression
Description
Computes a weighted average of inputs. Used in the context of computing weighted averages of predictions.
Predictions are averaged using weights (in order of appearance in the data) which are optimized using
nonlinear optimization from the package nloptr for a measure provided in
measure. (defaults to classif.ce for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model.
This Learner implements and generalizes an approach proposed in LeDell (2015) that uses non-linear
optimization in order to learn base-learner weights that optimize a given performance metric (e.g AUC).
The approach is similar but not exactly the same as the one implemented as AUC in the SuperLearner
R package (when metric is "classif.auc").
For a more detailed analysis and the general idea, the reader is referred to LeDell (2015).
Note, that weights always sum to 1 by division by sum(weights) before weighting
incoming features.
Usage
mlr_learners_classif.avg
mlr_learners_regr.avg
Format
R6Class object inheriting from mlr3::LearnerClassif/mlr3::Learner.
Parameters
The parameters are the parameters inherited from LearnerClassif, as well as:
-
measure::Measure|character
Measureto optimize for. Will be converted to aMeasurein case it ischaracter. Initialized to"classif.ce", i.e. misclassification error for classification and"regr.mse", i.e. mean squared error for regression. -
optimizer::Optimizer|character(1)
Optimizerused to find optimal thresholds. Ifcharacter, converts toOptimizerviaopt. Initialized toOptimizerNLoptr. Nloptr hyperparameters are initialized toxtol_rel = 1e-8,algorithm = "NLOPT_LN_COBYLA"and equal initial weights for each learner. For more fine-grained control, it is recommended to supply a instantiatedOptimizer. -
log_level::character(1)|integer(1)
Set a temporary log-level forlgr::get_logger("mlr3/bbotk"). Initialized to: "warn".
Methods
-
LearnerClassifAvg$new(), id = "classif.avg")
(chr) ->self
Constructor. -
LearnerRegrAvg$new(), id = "regr.avg")
(chr) ->self
Constructor.
References
LeDell, Erin (2015). Scalable Ensemble Learning and Computationally Efficient Variance Estimation. Ph.D. thesis, UC Berkeley.
See Also
Other Learners:
mlr_learners_graph
Other Ensembles:
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Encapsulate a Graph as a Learner
Description
A Learner that encapsulates a Graph to be used in
mlr3 resampling and benchmarks.
The Graph must return a single Prediction on its $predict()
call. The result of the $train() call is discarded, only the
internal state changes during training are used.
The predict_type of a GraphLearner can be obtained or set via it's predict_type active binding.
Setting a new predict type will try to set the predict_type in all relevant
PipeOp / Learner encapsulated within the Graph.
Similarly, the predict_type of a Graph will always be the smallest denominator in the Graph.
A GraphLearner is always constructed in an untrained state. When the graph argument has a
non-NULL $state, it is ignored.
Format
R6Class object inheriting from mlr3::Learner.
Construction
GraphLearner$new(graph, id = NULL, param_vals = list(), task_type = NULL, predict_type = NULL)
-
graph::Graph|PipeOp
Graphto wrap. Can be aPipeOp, which is automatically converted to aGraph. This argument is usually cloned, unlessclone_graphisFALSE; to access theGraphinsideGraphLearnerby-reference, use$graph.
-
id::character(1)Identifier of the resultingLearner. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings . Defaultlist(). -
task_type::character(1)
Whattask_typetheGraphLearnershould have; usually automatically inferred forGraphs that are simple enough. -
predict_type::character(1)
Whatpredict_typetheGraphLearnershould have; usually automatically inferred forGraphs that are simple enough. -
clone_graph::logical(1)
Whether to clonegraphupon construction. Unintentionally changinggraphby reference can lead to unexpected behaviour, soTRUE(default) is recommended. In particular, note that the$stateof$graphis set toNULLby reference on construction ofGraphLearner, during$train(), and during$predict()whenclone_graphisFALSE.
Fields
Fields inherited from Learner, as well as:
-
graph::Graph
Graphthat is being wrapped. This field contains the prototype of theGraphthat is being trained, but does not contain the model. Usegraph_modelto access the trainedGraphafter$train(). Read-only. -
graph_model::Learner
Graphthat is being wrapped. ThisGraphcontains a trained state after$train(). Read-only. -
pipeops:: namedlistofPipeOp
Contains allPipeOps in the underlyingGraph, named by thePipeOp's$ids. Shortcut for$graph_model$pipeops. SeeGraphfor details. -
edges::data.tablewith columnssrc_id(character),src_channel(character),dst_id(character),dst_channel(character)
Table of connections between thePipeOps in the underlyingGraph. Shortcut for$graph$edges. SeeGraphfor details. -
param_set::ParamSet
Parameters of the underlyingGraph. Shortcut for$graph$param_set. SeeGraphfor details. -
pipeops_param_set:: namedlist()
Named list containing theParamSets of allPipeOps in theGraph. See there for details. -
pipeops_param_set_values:: namedlist()
Named list containing the set parameter values of allPipeOps in theGraph. See there for details. -
internal_tuned_values:: namedlist()orNULL
The internal tuned parameter values collected from allPipeOps.NULLis returned if the learner is not trained or none of the wrapped learners supports internal tuning. -
internal_valid_scores:: namedlist()orNULL
The internal validation scores as retrieved from thePipeOps. The names are prefixed with the respective IDs of thePipeOps.NULLis returned if the learner is not trained or none of the wrapped learners supports internal validation. -
validate::numeric(1),"predefined","test"orNULL
How to construct the validation data. This also has to be configured for the individualPipeOps such asPipeOpLearner, seeset_validate.GraphLearner. For more details on the possible values, seemlr3::Learner. -
marshaled::logical(1)
Whether the learner is marshaled. -
impute_selected_features::logical(1)
Whether to heuristically determine$selected_features()as all$selected_features()of all "base learner" Learners, even if they do not have the"selected_features"property / do not implement$selected_features(). Ifimpute_selected_featuresisTRUEand the base learners do not implement$selected_features(), theGraphLearner's$selected_features()method will return all features seen by the base learners. This is useful in cases where feature selection is performed inside theGraph: The$selected_features()will then be the set of features that were selected by theGraph. Ifimpute_selected_featuresisFALSE, the$selected_features()method will throw an error if$selected_features()is not implemented by the base learners.
This is a heuristic and may report more features than actually used by the base learners, in cases where the base learners do not implement$selected_features(). The default isFALSE.
Methods
Methods inherited from Learner, as well as:
-
ids(sorted = FALSE)
(logical(1)) ->character
Get IDs of allPipeOps. This is in order thatPipeOps were added ifsortedisFALSE, and topologically sorted ifsortedisTRUE. -
plot(html = FALSE, horizontal = FALSE)
(logical(1),logical(1)) ->NULL
Plot theGraph, using either the igraph package (forhtml = FALSE, default) or thevisNetworkpackage forhtml = TRUEproducing ahtmlWidget. ThehtmlWidgetcan be rescaled usingvisOptions. Forhtml = FALSE, the orientation of the plotted graph can be controlled throughhorizontal. -
marshal
(any) ->self
Marshal the model. -
unmarshal
(any) ->self
Unmarshal the model. -
base_learner(recursive = Inf, return_po = FALSE, return_all = FALSE, resolve_branching = TRUE)
(numeric(1),logical(1),logical(1),character(1)) ->Learner|PipeOp|listofLearner|listofPipeOp
Return the base learner of theGraphLearner. Ifrecursiveis 0, theGraphLearneritself is returned. Otherwise, theGraphis traversed backwards to find the firstPipeOpcontaining a$learner_modelfield. Ifrecursiveis 1, that$learner_model(or containingPipeOp, ifreturn_poisTRUE) is returned. Ifrecursiveis greater than 1, the discovered base learner'sbase_learner()method is called withrecursive - 1.recursivemust be set to 1 ifreturn_pois TRUE, and must be set to at most 1 ifreturn_allisTRUE.
Ifreturn_poisTRUE, the container-PipeOpis returned instead of theLearner. This will typically be aPipeOpLearneror aPipeOpLearnerCV.
Ifreturn_allisTRUE, alistofLearners orPipeOps is returned. Ifreturn_poisFALSE, this list may containMultiplicityobjects, which are not unwrapped. Ifreturn_allisFALSEand there are multiple possible base learners, an error is thrown. This may also happen if only a singlePipeOpLearneris present that was trained with aMultiplicity.
Ifresolve_branchingisTRUE, and when aPipeOpUnbranchis encountered, the correspondingPipeOpBranchis searched, and its hyperparameter configuration is used to select the base learner. There may be multiple correspondingPipeOpBranchs, which are all considered. Ifresolve_branchingisFALSE,PipeOpUnbranchis treated as any otherPipeOpwith multiple inputs; all possible branch paths are considered equally.
The following standard extractors as defined by the Learner class are available.
Note that these typically only extract information from the $base_learner().
This works well for simple Graphs that do not modify features too much, but may give unexpected results for Graphs that
add new features or move information between features.
As an example, consider a feature A with missing values, and a feature B that is used for imputation, using a po("imputelearner").
In a case where the following Learner performs embedded feature selection and only selects feature A,
the selected_features() method could return only feature A, and $importance() may even report 0 for feature B.
This would not be entirely accurate when considering the entire GraphLearner, as feature B is used for imputation and would therefore have an impact on predictions.
The following should therefore only be used if the Graph is known to not have an impact on the relevant properties.
-
importance()
() ->numeric
The$importance()returned by the base learner, if it has the"importanceproperty. Throws an error otherwise. -
selected_features()
() ->character
The$selected_features()returned by the base learner, if it has the"selected_featuresproperty. If the base learner does not have the"selected_features"property andimpute_selected_featuresisTRUE, all features seen by the base learners are returned. Throws an error otherwise. -
oob_error()
() ->numeric(1)
The$oob_error()returned by the base learner, if it has the"oob_errorproperty. Throws an error otherwise. -
loglik()
() ->numeric(1)
The$loglik()returned by the base learner, if it has the"loglikproperty. Throws an error otherwise.
Internals
as_graph() is called on the graph argument, so it can technically also be a list of things, which is
automatically converted to a Graph via gunion(); however, this will usually not result in a valid Graph that can
work as a Learner. graph can furthermore be a Learner, which is then automatically
wrapped in a Graph, which is then again wrapped in a GraphLearner object; this usually only adds overhead and is not
recommended.
See Also
Other Learners:
mlr_learners_avg
Examples
library("mlr3")
graph = po("pca") %>>% lrn("classif.rpart")
lr = GraphLearner$new(graph)
lr = as_learner(graph) # equivalent
lr$train(tsk("iris"))
lr$graph$state # untrained version!
# The following is therefore NULL:
lr$graph$pipeops$classif.rpart$learner_model$model
# To access the trained model from the PipeOpLearner's Learner, use:
lr$graph_model$pipeops$classif.rpart$learner_model$model
# Feature importance (of principal components):
lr$graph_model$pipeops$classif.rpart$learner_model$importance()
Dictionary of PipeOps
Description
A simple Dictionary storing objects of class PipeOp.
Each PipeOp has an associated help page, see mlr_pipeops_[id].
Format
R6Class object inheriting from mlr3misc::Dictionary.
Fields
Fields inherited from Dictionary, as well as:
-
metainf::environment
Environment that stores themetainfargument of the$add()method. Only for internal use.
Methods
Methods inherited from Dictionary, as well as:
-
add(key, value, metainf = NULL)
(character(1),R6ClassGenerator,NULL|list)
Adds constructorvalueto the dictionary with keykey, potentially overwriting a previously stored item. Ifmetainfis notNULL(the default), it must be alistof arguments that will be given to thevalueconstructor (i.e.value$new()) when it needs to be constructed foras.data.tablePipeOplisting.
S3 methods
-
as.data.table(dict)
Dictionary->data.table::data.table
Returns adata.tablewith the following columns:-
key:: (character)
Key with which thePipeOpwas registered to theDictionaryusing the$add()method. -
label:: (character)
Description of thePipeOp's functionality. -
packages:: (character)
Set of all required packages for thePipeOp's train and predict methods. -
tags:: (character)
A set of tags associated with thePipeOpdescribing its purpose. -
feature_types:: (character)
Feature types thePipeOpoperates on. IsNAforPipeOps that do not directly operate on a Task. -
input.num,output.num:: (integer)
Number of thePipeOp's input and output channels. IsNAforPipeOps which accept a varying number of input and/or output channels depending a construction argument. Seeinputandoutputfields ofPipeOp. -
input.type.train,input.type.predict,output.type.train,output.type.predict:: (character)
Types that are allowed as input to or returned as output of thePipeOp's$train()and$predict()methods.
A value ofNULLmeans that a null object, e.g. no data, is taken as input or being returned as output. A value of "*" means that any type is possible.
If bothinput.type.trainandoutput.type.trainor bothinput.type.predictandoutput.type.predictcontain values enclosed by square brackets ("[", "]"), then the respective input or channel isMultiplicity-aware. For more information, seeMultiplicity.
-
See Also
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Dictionaries:
mlr_graphs
Examples
library("mlr3")
mlr_pipeops$get("learner", lrn("classif.rpart"))
# equivalent:
po("learner", learner = lrn("classif.rpart"))
# all PipeOps currently in the dictionary:
as.data.table(mlr_pipeops)[, c("key", "input.num", "output.num", "packages")]
ADAS Balancing
Description
Generates a more balanced data set by creating synthetic instances of the minority classes using the ADASYN algorithm.
The algorithm generates for each minority instance new data points based on its K nearest neighbors and the difficulty of learning for that data point.
It can only be applied to tasks with numeric features that have no missing values.
See smotefamily::ADAS for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpADAS$new(id = "adas", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"adas". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
K::numeric(1)
The number of nearest neighbors used for sampling new values. Default is5. SeeADAS().
Internals
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
He H, Bai Y, Garcia, A. E, Li S (2008). “ADASYN: Adaptive synthetic sampling approach for imbalanced learning.” In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328. doi:10.1109/IJCNN.2008.4633969.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = data.frame(
target = factor(sample(c("c1", "c2"), size = 300, replace = TRUE, prob = c(0.1, 0.9))),
x1 = rnorm(300),
x2 = rnorm(300)
)
task = TaskClassif$new(id = "example", backend = data, target = "target")
task$head()
table(task$data(cols = "target"))
# Generate synthetic data for minority class
pop = po("adas")
adas_result = pop$train(list(task))[[1]]$data()
nrow(adas_result)
table(adas_result$target)
BLSMOTE Balancing
Description
Adds new data points by generating synthetic instances for the minority class using the Borderline-SMOTE algorithm.
This can only be applied to classification tasks with numeric features that have no missing values.
See smotefamily::BLSMOTE for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpBLSmote$new(id = "blsmote", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"smote". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
K::numeric(1)
The number of nearest neighbors used for sampling from the minority class. Default is5. SeeBLSMOTE(). -
C::numeric(1)
The number of nearest neighbors used for classifying sample points as SAFE/DANGER/NOISE. Default is5. SeeBLSMOTE(). -
dup_size::numeric(1)
Desired times of synthetic minority instances over the original number of majority instances.0leads to balancing minority and majority class. Default is0. SeeBLSMOTE(). -
method::character(1)
The type of Borderline-SMOTE algorithm to use. Default is"type1". SeeBLSMOTE(). -
quiet::logical(1)
Whether to suppress printing status during training. Initialized toTRUE.
Internals
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
Han H, Wang W, Mao B (2005). “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning.” In Huang D, Zhang X, Huang G (eds.), Advances in Intelligent Computing, 878–887. ISBN 978-3-540-31902-3, doi:10.1007/11538059_91.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = smotefamily::sample_generator(500, 0.8)
data$result = factor(data$result)
task = TaskClassif$new(id = "example", backend = data, target = "result")
task$head()
table(task$data(cols = "result"))
# Generate synthetic data for minority class
pop = po("blsmote")
bls_result = pop$train(list(task))[[1]]$data()
nrow(bls_result)
table(bls_result$result)
Box-Cox Transformation of Numeric Features
Description
Conducts a Box-Cox transformation on numeric features. The lambda parameter
of the transformation is estimated during training and used for both training
and prediction transformation.
See bestNormalize::boxcox() for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpBoxCox$new(id = "boxcox", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"boxcox". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their transformed versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as a list of class boxcox for each column, which is transformed.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
standardize::logical(1)
Whether to center and scale the transformed values to attempt a standard normal distribution. For details seeboxcox(). -
eps::numeric(1)
Tolerance parameter to identify if lambda parameter is equal to zero. For details seeboxcox(). -
lower::numeric(1)
Lower value for estimation of lambda parameter. For details seeboxcox(). -
upper::numeric(1)
Upper value for estimation of lambda parameter. For details seeboxcox().
Internals
Uses the bestNormalize::boxcox function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("boxcox")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Path Branching
Description
Perform alternative path branching: PipeOpBranch has multiple output channels
that connect to different paths in a Graph. At any time, only one of these
paths will be taken for execution. At the end of the different paths, the
PipeOpUnbranch PipeOp must be used to indicate the end of alternative paths.
Not to be confused with PipeOpCopy, the naming scheme is a bit unfortunate.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpBranch$new(options, id = "branch", param_vals = list())
-
options::numeric(1)|character
Ifoptionsis an integer number, it determines the number of output channels / options that are created, namedoutput1...output<n>. The$selectionparameter will then be an integer. Ifoptionsis acharacter, it determines the names of channels directly. The$selectionparameter will then be factorial. -
id::character(1)
Identifier of resulting object, default"branch". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpBranch has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpBranch has multiple output channels depending on the options construction argument, named "output1", "output2", ...
if options is numeric, and named after each options value if options is a character.
All output channels produce the object given as input ("*") or NO_OP, both during training and prediction.
State
The $state is left empty (list()).
Parameters
-
selection::numeric(1)|character(1)
Selection of branching path to take. Is aParamIntif theoptionsparameter during construction was anumeric(1), and ranges from 1 tooptions. Is aParamFctif theoptionsparameter was acharacterand its possible values are theoptionsvalues. Initialized to either 1 (if theoptionsconstruction argument isnumeric(1)) or the first element ofoptions(if it ischaracter).
Internals
Alternative path branching is handled by the PipeOp backend. To indicate that
a path should not be taken, PipeOpBranch returns the NO_OP object on its
output channel. The PipeOp handles each NO_OP input by automatically
returning a NO_OP output without calling private$.train() or private$.predict(),
until PipeOpUnbranch is reached. PipeOpUnbranch will then take multiple inputs,
all except one of which must be a NO_OP, and forward the only non-NO_OP
object on its output.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Path Branching:
NO_OP,
filter_noop(),
is_noop(),
mlr_pipeops_unbranch
Examples
library("mlr3")
pca = po("pca")
nop = po("nop")
choices = c("pca", "nothing")
gr = po("branch", choices) %>>%
gunion(list(pca, nop)) %>>%
po("unbranch", choices)
gr$param_set$values$branch.selection = "pca"
gr$train(tsk("iris"))
gr$param_set$values$branch.selection = "nothing"
gr$train(tsk("iris"))
Chunk Input into Multiple Outputs
Description
Chunks its input into outnum chunks.
Creates outnum Tasks during training, and
simply passes on the input during outnum times during prediction.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpChunk$new(outnum, id = "chunk", param_vals = list())
-
outnum::numeric(1)
Number of output channels, and therefore number of chunks created. -
id::character(1)
Identifier of resulting object, default"chunk". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output
PipeOpChunk has one input channel named "input", taking a Task both during training and prediction.
PipeOpChunk has multiple output channels depending on the options construction argument, named "output1", "output2", ...
All output channels produce (respectively disjoint, random) subsets of the input Task during training, and
pass on the original Task during prediction.
State
The $state is left empty (list()).
Parameters
-
shuffle::logical(1)
Should the data be shuffled before chunking? Initialized toTRUE.
Internals
Uses the mlr3misc::chunk_vector() function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("wine")
opc = mlr_pipeops$get("chunk", 2)
# watch the row number: 89 during training (task is chunked)...
opc$train(list(task))
# ... 178 during predict (task is copied)
opc$predict(list(task))
Class Balancing
Description
Both undersamples a Task to keep only a fraction of the rows of the majority class,
as well as oversamples (repeats data points) rows of the minority class.
Sampling happens only during training phase. Class-balancing a Task by sampling may be
beneficial for classification with imbalanced training data.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpClassBalancing$new(id = "classbalancing", param_vals = list())
-
id::character(1)Identifier of the resulting object, default"classbalancing" -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added or removed rows to balance target classes.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
-
ratio::numeric(1)
Ratio of number of rows of classes to keep, relative to the$referencevalue. Initialized to 1. -
reference::numeric(1)
What the$ratiovalue is measured against. Can be"all"(mean instance count of all classes),"major"(instance count of class with most instances),"minor"(instance count of class with fewest instances),"nonmajor"(average instance count of all classes except the major one),"nonminor"(average instance count of all classes except the minor one), and"one"($ratiodetermines the number of instances to have, per class). Initialized to"all". -
adjust::numeric(1)
Which classes to up / downsample. Can be"all"(up and downsample all to match required instance count),"major","minor","nonmajor","nonminor"(see respective values for$reference),"upsample"(only upsample), and"downsample". Initialized to"all". -
shuffle::logical(1)
Whether to shuffle the rows of the resulting task. In case the data is upsampled andshuffle = FALSE, the resulting task will have the original rows (which were not removed in downsampling) in the original order, followed by all newly added rows ordered by target class. Initialized toTRUE.
Internals
Up / downsampling happens as follows: At first, a "target class count" is calculated, by taking the mean
class count of all classes indicated by the reference parameter (e.g. if reference is "nonmajor":
the mean class count of all classes that are not the "major" class, i.e. the class with the most samples)
and multiplying this with the value of the ratio parameter. If reference is "one", then the "target
class count" is just the value of ratio (i.e. 1 * ratio).
Then for each class that is referenced by the adjust parameter (e.g. if adjust is "nonminor":
each class that is not the class with the fewest samples), PipeOpClassBalancing either throws out
samples (downsampling), or adds additional rows that are equal to randomly chosen samples (upsampling),
until the number of samples for these classes equals the "target class count".
No upsampling is performed for classes that were not observed during training (i.e. empty factor levels in the target column).
Uses task$filter() to remove rows. When identical rows are added during upsampling, then the task$row_roles$use can not be used
to duplicate rows because of [inaudible]; instead the task$rbind() function is used, and
a new data.table is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("spam")
opb = po("classbalancing")
# target class counts
table(task$truth())
# double the instances in the minority class (spam)
opb$param_set$values = list(ratio = 2, reference = "minor",
adjust = "minor", shuffle = FALSE)
result = opb$train(list(task))[[1L]]
table(result$truth())
# up or downsample all classes until exactly 20 per class remain
opb$param_set$values = list(ratio = 20, reference = "one",
adjust = "all", shuffle = FALSE)
result = opb$train(list(task))[[1]]
table(result$truth())
Majority Vote Prediction
Description
Perform (weighted) majority vote prediction from classification Predictions by connecting
PipeOpClassifAvg to multiple PipeOpLearner outputs.
Always returns a "prob" prediction, regardless of the incoming Learner's
$predict_type. The label of the class with the highest predicted probability is selected as the
"response" prediction. If the Learner's $predict_type is set to "prob",
the prediction obtained is also a "prob" type prediction with the probability predicted to be a
weighted average of incoming predictions.
All incoming Learner's $predict_type must agree.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction. Defaults to equal weights for each model.
Format
R6Class inheriting from PipeOpEnsemble/PipeOp.
Construction
PipeOpClassifAvg$new(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list())
-
innum::numeric(1)
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. -
collect_multiplicity::logical(1)
IfTRUE, the input is aMultiplicitycollecting channel. This means, aMultiplicityinput, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnumto be 0. Default isFALSE. -
id::character(1)Identifier of the resulting object, default"classifavg". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionClassif
is used as input and output during prediction.
State
The $state is left empty (list()).
Parameters
The parameters are the parameters inherited from the PipeOpEnsemble.
Internals
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpEnsemble/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Examples
library("mlr3")
# Simple Bagging
gr = ppl("greplicate",
po("subsample") %>>%
po("learner", lrn("classif.rpart")),
n = 3
) %>>%
po("classifavg")
resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))
Class Weights for Sample Weighting
Description
Adds a class weight column to the Task that different Learners may be
able to use for sample weighting. Sample weights are added to each sample according to the target class.
Only binary classification tasks are supported.
Caution: when constructed naively without parameter, the weights are all set to 1. The minor_weight parameter
must be adjusted for this PipeOp to be useful.
Note this only sets the "weights_learner" column.
It therefore influences the behaviour of subsequent Learners, but does not influence resampling or evaluation metric weights.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpClassWeights$new(id = "classweights", param_vals = list())
-
id::character(1)Identifier of the resulting object, default"classweights" -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added weights column according to target class.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
-
minor_weight::numeric(1)
Weight given to samples of the minor class. Major class samples have weight 1. Initialized to 1.
Internals
Introduces, or overwrites, the "weights" column in the Task. However, the Learner method needs to
respect weights for this to have an effect.
The newly introduced column is named .WEIGHTS; there will be a naming conflict if this column already exists and is not a
weight column itself.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("spam")
opb = po("classweights")
# task weights
if ("weights_learner" %in% names(task)) {
task$weights_learner # recent mlr3-versions
} else {
task$weights # old mlr3-versions
}
# double the instances in the minority class (spam)
opb$param_set$values$minor_weight = 2
result = opb$train(list(task))[[1L]]
if ("weights_learner" %in% names(result)) {
result$weights_learner # recent mlr3-versions
} else {
result$weights # old mlr3-versions
}
Apply a Function to each Column of a Task
Description
Applies a function to each column of a task. Use the affect_columns parameter inherited from
PipeOpTaskPreprocSimple to limit the columns this function should be applied to. This can be used
for simple parameter transformations or type conversions (e.g. as.numeric).
The same function is applied during training and prediction. One important relationship for
machine learning preprocessing is that during the prediction phase, the preprocessing on each
data row should be independent of other rows. Therefore, the applicator function should always
return a vector / list where each result component only depends on the corresponding input component and
not on other components. As a rule of thumb, if the function f generates output different
from Vectorize(f), it is not a function that should be used for applicator.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpColApply$new(id = "colapply", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"colapply". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features changed according to the applicator parameter.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
applicator::function
Function to apply to each column of the task. The return value should be avectorof the same length as the input, i.e., the function vectorizes over the input. A typical example would beas.numeric.
The return value can also be amatrix,data.frame, ordata.table. In this case, the length of the input must match the number of returned rows. The names of the resulting features of the outputTaskis based on the (column) name(s) of the return value of the applicator function, prefixed with the original feature name separated by a dot (.). UseVectorizeto create a vectorizing function from any function that ordinarily only takes one element input.
Internals
Calls map on the data, using the value of applicator as f. and coerces the output via as.data.table.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
poca = po("colapply", applicator = as.character)
poca$train(list(task))[[1]] # types are converted
# function that does not vectorize
f1 = function(x) {
# we could use `ifelse` here, but that is not the point
if (x > 1) {
"a"
} else {
"b"
}
}
poca$param_set$values$applicator = Vectorize(f1)
poca$train(list(task))[[1]]$data()
# only affect Petal.* columns
poca$param_set$values$affect_columns = selector_grep("^Petal")
poca$train(list(task))[[1]]$data()
# function returning multiple columns
f2 = function(x) {
cbind(floor = floor(x), ceiling = ceiling(x))
}
poca$param_set$values$applicator = f2
poca$param_set$values$affect_columns = selector_all()
poca$train(list(task))[[1]]$data()
Collapse Factors
Description
Collapses factors of type factor, ordered: Collapses the rarest factors in the training samples, until target_level_count
levels remain. Levels that have prevalence strictly above no_collapse_above_prevalence or absolute count strictly above no_collapse_above_absolute
are retained, however. For factor variables, these are collapsed to the next larger level, for ordered variables, rare variables
are collapsed to the neighbouring class, whichever has fewer samples.
In case both no_collapse_above_prevalence and no_collapse_above_absolute are given, the less strict threshold of the two will be used, i.e. if
no_collapse_above_prevalence is 1 and no_collapse_above_absolute is 10 for a task with 100 samples, levels that are seen more than 10 times
will not be collapsed.
Levels not seen during training are not touched during prediction; Therefore it is useful to combine this with the
PipeOpFixFactors.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpCollapseFactors$new(id = "collapsefactors", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"collapsefactors". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with rare affected factor and ordered feature levels collapsed.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
collapse_map:: namedlistof namedlistofcharacter
List of factor level maps. For each factor,collapse_mapcontains a namedlistthat indicates what levels of the input task get mapped to what levels of the output task. Ifcollapse_maphas an entryfeat_1with an entrya = c("x", "y"), it means that levels"x"and"y"get collapsed to level"a"in feature"feat_1".
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
no_collapse_above_prevalence::numeric(1)
Fraction of samples below which factor levels get collapsed. Default is 1, which causes all levels to be collapsed untiltarget_level_countremain. -
no_collapse_above_absolute::integer(1)
Number of samples below which factor levels get collapsed. Default isInf, which causes all levels to be collapsed untiltarget_level_countremain. -
target_level_count::integer(1)
Number of levels to retain. Default is 2.
Internals
Makes use of the fact that levels(fact_var) = list(target1 = c("source1", "source2"), target2 = "source2") causes
renaming of level "source1" and "source2" both to "target1", and also "source2" to "target2".
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
op = PipeOpCollapseFactors$new()
# Create example training task
df = data.frame(
target = runif(100),
fct = factor(rep(LETTERS[1:6], times = c(25, 30, 5, 15, 5, 20))),
ord = factor(rep(1:6, times = c(20, 25, 30, 5, 5, 15)), ordered = TRUE)
)
task = TaskRegr$new(df, target = "target", id = "example_train")
# Training
train_task_collapsed = op$train(list(task))[[1]]
train_task_collapsed$levels(c("fct", "ord"))
# Create example prediction task
df_pred = data.frame(
target = runif(7),
fct = factor(LETTERS[1:7]),
ord = factor(1:7, ordered = TRUE)
)
pred_task = TaskRegr$new(df_pred, target = "target", id = "example_pred")
# Prediction
pred_task_collapsed = op$predict(list(pred_task))[[1]]
pred_task_collapsed$levels(c("fct", "ord"))
Change Column Roles of a Task
Description
Changes the column roles of the input Task according to new_role or its inverse new_role_direct.
Setting a new target variable or changing the role of an existing target variable is not supported.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpColRoles$new(id = "colroles", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"colroles". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with transformed column roles according to new_role or its inverse new_role_direct.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
new_role:: namedlist
Named list of new column roles by column. The names must match the column names of the input task that will later be trained/predicted on. Each entry of the list must contain a character vector with possible values ofmlr_reflections$task_col_roles. If the value is given ascharacter()orNULL, the column will be dropped from the input task. Changing the role of a column results in this column loosing its previous role(s). -
new_role_direct:: namedlist
# Named list of new column roles by role. The names must match the possible column roles, i.e. values ofmlr_reflections$task_col_roles. Each entry of the list must contain a character vector with column names of the input task that will later be trained/predicted on. If the value is given ascharacter()orNULL, all columns will be dropped from the role given in the element name. The value given for a role overwrites the previous entry intask$col_rolesfor that role, completely.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("penguins")
pop = po("colroles", param_vals = list(
new_role = list(body_mass = c("order", "feature"))
))
train_out1 = pop$train(list(task))[[1L]]
train_out1$col_roles
pop$param_set$set_values(
new_role = NULL,
new_role_direct = list(order = character(), group = "island")
)
train_out2 = pop$train(list(train_out1))
train_out2$col_roles
Copy Input Multiple Times
Description
Copies its input outnum times. This PipeOp usually not needed, because copying happens automatically when one
PipeOp is followed by multiple different PipeOps. However, when constructing big Graphs using the
%>>%-operator, PipeOpCopy can be helpful to specify which PipeOp gets connected to which.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpCopy$new(outnum, id = "copy", param_vals = list())
-
outnum::numeric(1)
Number of output channels, and therefore number of copies being made. -
id::character(1)
Identifier of resulting object, default"copy". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpCopy has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpCopy has multiple output channels depending on the outnum construction argument, named "output1", "output2", ...
All output channels produce the object given as input ("*").
State
The $state is left empty (list()).
Parameters
PipeOpCopy has no parameters.
Internals
Note that copies are not clones, but only reference copies. This affects R6-objects: If R6 objects are copied using
PipeOpCopy, they must be cloned beforehand.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Placeholder Pipeops:
mlr_pipeops_nop
Examples
# The following copies the output of 'scale' automatically to both
# 'pca' and 'nop'
po("scale") %>>%
gunion(list(
po("pca"),
po("nop")
))
# The following would not work: the '%>>%'-operator does not know
# which output to connect to which input
# > gunion(list(
# > po("scale"),
# > po("select")
# > )) %>>%
# > gunion(list(
# > po("pca"),
# > po("nop"),
# > po("imputemean")
# > ))
# Instead, the 'copy' operator makes clear which output gets copied.
gunion(list(
po("scale") %>>% po("copy", outnum = 2),
po("select")
)) %>>%
gunion(list(
po("pca"),
po("nop"),
po("imputemean")
))
Preprocess Date Features
Description
Based on POSIXct columns of the data, a set of date related features is computed and added to
the feature set of the output task. If no POSIXct column is found, the original task is
returned unaltered. This functionality is based on the add_datepart() and
add_cyclic_datepart() functions from the fastai library. If operation on only particular
POSIXct columns is requested, use the affect_columns parameter inherited from
PipeOpTaskPreprocSimple.
If cyclic = TRUE, cyclic features are computed for the features "month", "week_of_year",
"day_of_year", "day_of_month", "day_of_week", "hour", "minute" and "second". This
means that for each feature x, two additional features are computed, namely the sine and cosine
transformation of 2 * pi * x / max_x (here max_x is the largest possible value the feature
could take on + 1, assuming the lowest possible value is given by 0, e.g., for hours from 0 to
23, this is 24). This is useful to respect the cyclical nature of features such as seconds, i.e.,
second 21 and second 22 are one second apart, but so are second 60 and second 1 of the next
minute.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpDateFeatures$new(id = "datefeatures", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"datefeatures". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with date-related features computed and added to the
feature set of the output task and the POSIXct columns of the data removed from the
feature set (depending on the value of keep_date_var).
State
The $state is a named list with the $state elements inherited from
PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
keep_date_var::logical(1)
Should thePOSIXctcolumns be kept as features? Default FALSE. -
cyclic::logical(1)
Should cyclic features be computed? See Internals. Default FALSE. -
year::logical(1)
Should the year be extracted as a feature? Default TRUE. -
month::logical(1)
Should the month be extracted as a feature? Default TRUE. -
week_of_year::logical(1)
Should the week of the year be extracted as a feature? Default TRUE. -
day_of_year::logical(1)
Should the day of the year be extracted as a feature? Default TRUE. -
day_of_month::logical(1)
Should the day of the month be extracted as a feature? Default TRUE. -
day_of_week::logical(1)
Should the day of the week be extracted as a feature? Default TRUE. -
hour::logical(1)
Should the hour be extracted as a feature? Default TRUE. -
minute::logical(1)
Should the minute be extracted as a feature? Default TRUE. -
second::logical(1)
Should the second be extracted as a feature? Default TRUE. -
is_day::logical(1)
Should a feature be extracted indicating whether it is day time (06:00am - 08:00pm)? Default TRUE.
Internals
The cyclic feature transformation always assumes that values range from 0, so some values (e.g. day of the month) are shifted before sine/cosine transform.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
dat = iris
set.seed(1)
dat$date = sample(seq(as.POSIXct("2020-02-01"), to = as.POSIXct("2020-02-29"), by = "hour"),
size = 150L)
task = TaskClassif$new("iris_date", backend = dat, target = "Species")
pop = po("datefeatures", param_vals = list(cyclic = FALSE, minute = FALSE, second = FALSE))
pop$train(list(task))
pop$state
Reverse Factor Encoding
Description
Reverses one-hot or treatment encoding of columns. It collapses multiple numeric or integer columns into one factor
column based on a pre-specified grouping pattern of column names.
May be applied to multiple groups of columns, grouped by matching a common naming pattern. The grouping pattern is
extracted to form the name of the newly derived factor column, and levels are constructed from the previous column
names, with parts matching the grouping pattern removed (see examples). The level per row of the new factor column is generally
determined as the name of the column with the maximum value in the group.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncode$new(id = "decode", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"decode". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with encoding columns collapsed into new decoded columns.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
colmaps:: namedlist
Named list of named character vectors. Each element is named according to the new column name extracted bygroup_pattern. Each vector contains the level names for the new factor column that should be created, named by the corresponding old column name. Iftreatment_encodingisTRUE, then each vector also containsref_nameas the reference class with an empty string as name. -
treatment_encoding::logical(1)
Value oftreatment_encodinghyperparameter. -
cutoff::numeric(1)
Value oftreatment_encodinghyperparameter, or0if that is not given. -
ties_method::character(1)
Value ofties_methodhyperparameter.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
group_pattern::character(1)
A regular expression to be applied to column names. Should contain a capturing group for the new column name, and match everything that should not be interpreted as the new factor levels (which are constructed as the difference between column names and whatgroup_patternmatches). If set to"", all columns matching thegroup_patternare collapsed into one factor column calledpipeop.decoded. UsePipeOpRenameColumnsto rename this column. Initialized to"^([^.]+)\\.", which would extract everything up to the first dot as the new column name and construct new levels as everything after the first dot. -
treatment_encoding::logical(1)
IfTRUE, treatment encoding is assumed instead of one-hot encoding. Initialized toFALSE. -
treatment_cutoff::numeric(1)
Iftreatment_encodingisTRUE, specifies a cutoff value for identifying the reference level. The reference level is set toref_namein rows where the value is less than or equal to a specified cutoff value (e.g.,0) in all columns in that group. Default is0. -
ref_name::character(1)
Iftreatment_encodingisTRUE, specifies the name for reference levels. Default is"ref". -
ties_method::character(1)
Method for resolving ties if multiple columns have the same value. Specifies the value from which of the columns with the same value is to be picked. Options are"first","last", or"random". Initialized to"random".
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Reverse one-hot encoding
df = data.frame(
target = runif(4),
x.1 = rep(c(1, 0), 2),
x.2 = rep(c(0, 1), 2),
y.1 = rep(c(1, 0), 2),
y.2 = rep(c(0, 1), 2),
a = runif(4)
)
task_one_hot = TaskRegr$new(id = "example", backend = df, target = "target")
pop = po("decode")
train_out = pop$train(list(task_one_hot))[[1]]
# x.1 and x.2 are collapsed into x, same for y; a is ignored.
train_out$data()
# Reverse treatment encoding from PipeOpEncode
df = data.frame(
target = runif(6),
fct = factor(rep(c("a", "b", "c"), 2))
)
task = TaskRegr$new(id = "example", backend = df, target = "target")
po_enc = po("encode", method = "treatment")
task_encoded = po_enc$train(list(task))[[1]]
task_encoded$data()
po_dec = po("decode", treatment_encoding = TRUE)
task_decoded = pop$train(list(task))[[1]]
# x.1 and x.2 are collapsed into x. All rows where all values
# are smaller or equal to 0, the level is set to the reference level.
task_decoded$data()
# Different group_pattern
df = data.frame(
target = runif(4),
x_1 = rep(c(1, 0), 2),
x_2 = rep(c(0, 1), 2),
y_1 = rep(c(2, 0), 2),
y_2 = rep(c(0, 1), 2)
)
task = TaskRegr$new(id = "example", backend = df, target = "target")
# Grouped by first underscore
pop = po("decode", group_pattern = "^([^_]+)\\_")
train_out = pop$train(list(task))[[1]]
# x_1 and x_2 are collapsed into x, same for y
train_out$data()
# Empty string to collapse all matches into one factor column.
pop$param_set$set_values(group_pattern = "")
train_out = pop$train(list(task))[[1]]
# All columns are combined into a single column.
# The level for each row is determined by the column with the largest value in that row.
# By default, ties are resolved randomly.
train_out$data()
Factor Encoding
Description
Encodes columns of type factor and ordered.
Possible encodings are "one-hot" encoding, as well as encoding according to stats::contr.helmert(), stats::contr.poly(),
stats::contr.sum() and stats::contr.treatment().
Newly created columns are named via pattern [column-name].[x] where x is the respective factor level for "one-hot" and
"treatment" encoding, and an integer sequence otherwise.
Use the PipeOpTaskPreproc $affect_columns functionality to only encode a subset of columns, or only encode columns of a certain type.
character-type features can be encoded by converting them factor features first, using ppl("convert_types", "character", "factor").
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncode$new(id = "encode", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"encode". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected factor and ordered columns encoded according to the method
parameter.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
constrasts:: namedlistofmatrix
List of contrast matrices, one for each affected discrete feature. The rows of each matrix correspond to (training task) levels, the the columns to the new columns that replace the old discrete feature. Seestats::contrasts.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
method::character(1)
Initialized to"one-hot". One of:-
"one-hot": create a new column for each factor level. -
"treatment": createn-1columns leaving out the first factor level of each factor variable (seestats::contr.treatment()). -
"helmert": create columns according to Helmert contrasts (seestats::contr.helmert()). -
"poly": create columns with contrasts based on orthogonal polynomials (seestats::contr.poly()). -
"sum": create columns with contrasts summing to zero, (seestats::contr.sum()).
-
Internals
Uses the stats::contrasts functions. This is relatively inefficient for features with a large number of levels.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
data = data.table::data.table(x = factor(letters[1:3]), y = factor(letters[1:3]))
task = TaskClassif$new("task", data, "x")
poe = po("encode")
# poe is initialized with encoding: "one-hot"
poe$train(list(task))[[1]]$data()
# other kinds of encoding:
poe$param_set$values$method = "treatment"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "helmert"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "poly"
poe$train(list(task))[[1]]$data()
poe$param_set$values$method = "sum"
poe$train(list(task))[[1]]$data()
# converting character-columns
data_chr = data.table::data.table(x = factor(letters[1:3]), y = letters[1:3])
task_chr = TaskClassif$new("task_chr", data_chr, "x")
goe = ppl("convert_types", "character", "factor") %>>% po("encode")
goe$train(task_chr)[[1]]$data()
Conditional Target Value Impact Encoding
Description
Encodes columns of type factor, character and ordered.
Impact coding for classification Tasks converts factor levels of each (factorial) column to the difference between each target level's conditional log-likelihood given this level, and the target level's global log-likelihood.
Impact coding for regression Tasks converts factor levels of each (factorial) column to the difference between the target's conditional mean given this level, and the target's global mean.
Treats new levels during prediction like missing values.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodeImpact$new(id = "encodeimpact", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"encodeimpact". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected factor, character or
ordered parameters encoded.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
impact:: a namedlist
A list with an element for each affected feature:
For regression each element is a single column matrix of impact values for each level of that feature.
For classification, it is a list with an element for each feature level, which is a vector giving the impact of this feature level on each outcome level.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
smoothing::numeric(1)
A finite positive value used for smoothing. Mostly relevant for classification Tasks if a factor does not coincide with a target factor level (and would otherwise give an infinite logit value). Initialized to1e-4. -
impute_zero::logical(1)
IfTRUE, impute missing values as impact 0; otherwise the respective impact is coded asNA. DefaultFALSE.
Internals
Uses Laplace smoothing, mostly to avoid infinite values for classification Task.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
poe = po("encodeimpact")
task = TaskClassif$new("task",
data.table::data.table(
x = factor(c("a", "a", "a", "b", "b")),
y = factor(c("a", "a", "b", "b", "b"))),
"x")
poe$train(list(task))[[1]]$data()
poe$state
Impact Encoding with Random Intercept Models
Description
Encodes columns of type factor, character and ordered.
PipeOpEncodeLmer converts factor levels of each factorial column to the
estimated coefficients of a simple random intercept model.
Models are fitted with the glmer function of the lme4 package and are
of the type target ~ 1 + (1 | factor).
If the task is a regression task, the numeric target
variable is used as dependent variable and the factor is used for grouping.
If the task is a classification task, the target variable is used as dependent variable
and the factor is used for grouping.
If the target variable is multiclass, for each level of the multiclass target variable,
binary "one vs. rest" models are fitted.
For training, multiple models can be estimated in a cross-validation scheme to ensure that the same factor level does not always result in identical values in the converted numerical feature. For prediction, a global model (which was fitted on all observations during training) is used for each factor. New factor levels are converted to the value of the intercept coefficient of the global model for prediction. NAs are ignored by the CPO.
Use the PipeOpTaskPreproc $affect_columns functionality to only encode a subset of
columns, or only encode columns of a certain type.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodeLmer$new(id = "encodelmer", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"encodelmer". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected factor, character or
ordered parameters encoded according to the method parameter.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
target_levels::character
Levels of the target columns. -
control:: a namedlist
List of coefficients learned viaglmer.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
fast_optim::logical(1)
Iffast_optimisTRUE(default), a faster (up to 50 percent) optimizer from thenloptrpackage is used when fitting the lmer models. This uses additional stopping criteria which can give suboptimal results. Initialized toTRUE.
Internals
Uses the lme4::glmer. This is relatively inefficient for features with a large number of levels.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
poe = po("encodelmer")
task = TaskClassif$new("task",
data.table::data.table(
x = factor(c("a", "a", "a", "b", "b")),
y = factor(c("a", "a", "b", "b", "b"))),
"x")
poe$train(list(task))[[1]]$data()
poe$state
Piecewise Linear Encoding using Quantiles
Description
Encodes numeric and integer feature columns using piecewise lienar encoding. For details, see documentation of
PipeOpEncodePL or Gorishniy et al. (2022).
Bins are constructed by taking the quantiles of the respective feature column as bin boundaries. The first and
last boundaries are set to the minimum and maximum value of the feature, respectively. The number of bins can be
controlled with the numsplits hyperparameter.
Affected feature columns may contain NAs. These are ignored when calculating quantiles.
Format
R6Class object inheriting from PipeOpEncodePL/PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodePLQuantiles$new(id = "encodeplquantiles", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"encodeplquantiles". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric and integer columns encoded using piecewise
linear encoding with bins being derived from the quantiles of the respective original feature column.
State
The $state is a named list with the $state elements inherited from PipeOpEncodePL/PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
numsplits::integer(1)
Number of bins to create. Initialized to2. -
type::integer(1)
Method used to calculate sample quantiles. See help ofstats::quantile. Default is7.
Internals
This overloads the private$.get_bins() method of PipeOpEncodePL and uses the stats::quantile function
to derive the bins used for piecewise linear encoding.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpEncodePL/PipeOpTaskPreproc/PipeOp.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
PipeOpEncodePL,
mlr_pipeops_encodepltree
Examples
library(mlr3)
task = tsk("iris")$select(c("Petal.Width", "Petal.Length"))
pop = po("encodeplquantiles")
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
# Each feature was split into two encoded features using piecewise linear encoding
train_out$head()
# Prediction works the same as training, using the bins learned during training
predict_out = pop$predict(list(task))[[1L]]
predict_out$head()
# Binning into three bins per feature
# Using the nearest even order statistic for caluclating quantiles
pop$param_set$set_values(numsplits = 4, type = 3)
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
# Each feature was split into three encoded features using
# piecewise linear encoding
train_out$head()
Piecewise Linear Encoding using Decision Trees
Description
Encodes numeric and integer feature columns using piecewise lienar encoding. For details, see documentation of
PipeOpEncodePL or Gorishniy et al. (2022).
Bins are constructed by trainig one decision tree Learner per feature column, taking the target
column into account, and using decision boundaries as bin boundaries.
Format
R6Class object inheriting from PipeOpEncodePL/PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodePLTree$new(task_type, id = "encodepltree", param_vals = list())
-
task_type::character(1)
The class ofTaskthat should be accepted as input, given as acharacter(1). This is used to construct the appropriateLearnerto be used for obtaining the bins for piecewise linear encoding. Supported options are"TaskClassif"forLearnerClassifRpartor"TaskRegr"forLearnerRegrRpart. -
id::character(1)
Identifier of resulting object, default"encodeplquantiles". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif or TaskRegr is used as input and output during training and
prediction, depending on the task_type construction argument.
The output is the input Task with all affected numeric and integer columns encoded using piecewise
linear encoding with bins being derived from a decision tree Learner trained on the respective feature column.
State
The $state is a named list with the $state elements inherited from PipeOpEncodePL/PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as the parameters of
the Learner used for obtaining the bins for piecewise linear encoding.
Internals
This overloads the private$.get_bins() method of PipeOpEncodePL. To derive the bins for each feature, the
Task is split into smaller Tasks with only the target and respective feature as columns.
On these Tasks either a LearnerClassifRpart or
LearnerRegrRpart gets trained and the respective splits extracted as bin boundaries used
for piecewise linear encodings.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpEncodePL/PipeOpTaskPreproc/PipeOp.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
PipeOpEncodePL,
mlr_pipeops_encodeplquantiles
Examples
library(mlr3)
# For classification task
task = tsk("iris")$select(c("Petal.Width", "Petal.Length"))
pop = po("encodepltree", task_type = "TaskClassif")
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
# Each feature was split into three encoded features using piecewise linear encoding
train_out$head()
# Prediction works the same as training, using the bins learned during training
predict_out = pop$predict(list(task))[[1L]]
predict_out$head()
# Controlling behavior of the tree learner, here: setting minimum number of
# observations per node for a split to be attempted
pop$param_set$set_values(minsplit = 5)
train_out = pop$train(list(task))[[1L]]
# feature "hp" now gets split into five encoded features instead of three
pop$state$bins
train_out$head()
# For regression task
task = tsk("mtcars")$select(c("cyl", "hp"))
pop = po("encodepltree", task_type = "TaskRegr")
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
# First feature was split into three encoded features,
# second into two, using piecewise linear encoding
train_out$head()
Aggregate Features from Multiple Inputs
Description
Aggregates features from all input tasks by cbind()ing them together into a single
Task.
DataBackend primary keys and Task targets have to be equal
across all Tasks. Only the target column(s) of the first Task
are kept.
If assert_targets_equal is TRUE then target column names are compared and an error is thrown
if they differ across inputs.
If input tasks share some feature names but these features are not identical an error is thrown. This check is performed by first comparing the features names and if duplicates are found, also the values of these possibly duplicated features. True duplicated features are only added a single time to the output task.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpFeatureUnion$new(innum = 0, collect_multiplicity = FALSE, id = "featureunion", param_vals = list(), assert_targets_equal = TRUE)
-
innum::numeric(1)|character
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. Ifinnumis acharactervector, the number of input channels is the length ofinnum, and the columns of the result are prefixed with the values. -
collect_multiplicity::logical(1)
IfTRUE, the input is aMultiplicitycollecting channel. This means, aMultiplicityinput, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnumto be 0. Default isFALSE. -
id::character(1)
Identifier of the resulting object, default"featureunion". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist(). -
assert_targets_equal::logical(1)
Ifassert_targets_equalisTRUE(Default), task target column names are checked for agreement. Disagreeing target column names are usually a bug, so this should often be left at the default.
Input and Output Channels
PipeOpFeatureUnion has multiple input channels depending on the innum construction
argument, named "input1", "input2", ... if innum is nonzero; if innum is 0, there is
only one vararg input channel named "...". All input channels take a Task
both during training and prediction.
PipeOpFeatureUnion has one output channel named "output", producing a Task
both during training and prediction.
The output is a Task constructed by cbind()ing all features from all input
Tasks, both during training and prediction.
State
The $state is left empty (list()).
Parameters
PipeOpFeatureUnion has no Parameters.
Internals
PipeOpFeatureUnion uses the Task $cbind() method to bind the input values
beyond the first input to the first Task. This means if the Tasks
are database-backed, all of them except the first will be fetched into R memory for this. This
behaviour may change in the future.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Examples
library("mlr3")
task1 = tsk("iris")
gr = gunion(list(
po("nop"),
po("pca")
)) %>>% po("featureunion")
gr$train(task1)
task2 = tsk("iris")
task3 = tsk("iris")
po = po("featureunion", innum = c("a", "b"))
po$train(list(task2, task3))
Feature Filtering
Description
Feature filtering using a mlr3filters::Filter object, see the
mlr3filters package.
If a Filter can only operate on a subset of columns based on column type, then only these features are considered and filtered.
nfeat and frac will count for the features of the type that the Filter can operate on;
this means e.g. that setting nfeat to 0 will only remove features of the type that the Filter can work with.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpFilter$new(filter, id = filter$id, param_vals = list())
-
filter::Filter
Filterused for feature filtering. This argument is always cloned; to access theFilterinsidePipeOpFilterby-reference, use$filter.
-
id::character(1)
Identifier of the resulting object, defaulting to theidof theFilterbeing used. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features removed that were filtered out.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
scores:: namednumeric
Scores calculated for all features of the trainingTaskwhich are being used as cutoff for feature filtering. Iffracornfeatis given, the underlyingFiltermay choose to not calculate scores for all features that are given. This only includes features on which theFiltercan operate; e.g. if theFiltercan only operate on numeric features, then scores for factorial features will not be given. -
features::character
Names of features that are being kept. Features of types that theFiltercan not operate on are always being kept.
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Filter
used by this object. Besides, parameters introduced are:
-
filter.nfeat::numeric(1)
Number of features to select. Mutually exclusive withfrac,cutoff, andpermuted. -
filter.frac::numeric(1)
Fraction of features to keep. Mutually exclusive withnfeat,cutoff, andpermuted. -
filter.cutoff::numeric(1)
Minimum value of filter heuristic for which to keep features. Mutually exclusive withnfeat,frac, andpermuted. -
filter.permuted::integer(1)
If this parameter is set, a random permutation of each feature is added to the task before applying the filter. All features selected before thepermuted-th permuted features is selected are kept. This is similar to the approach in Wu (2007) and Thomas (2017). Mutually exclusive withnfeat,frac, andcutoff.
Note that at least one of filter.nfeat, filter.frac, filter.cutoff, and filter.permuted must be given.
Internals
This does not use the $.select_cols feature of PipeOpTaskPreproc to select only features compatible with the Filter;
instead the whole Task is used by private$.get_state() and subset internally.
Fields
Fields inherited from PipeOp, as well as:
-
filter::Filter
Filterthat is being used for feature filtering. Do not use this slot to get to the feature filtering scores after training; instead, use$state$scores. Read-only.
Methods
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
References
Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. doi:10.1198/016214506000000843.
Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. doi:10.1155/2017/1421409.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
library("mlr3filters")
# setup PipeOpFilter to keep the 5 most important
# features of the spam task w.r.t. their AUC
task = tsk("spam")
filter = flt("auc")
po = po("filter", filter = filter)
po$param_set
po$param_set$values$filter.nfeat = 5
# filter the task
filtered_task = po$train(list(task))[[1]]
# filtered task + extracted AUC scores
filtered_task$feature_names
head(po$state$scores, 10)
# feature selection embedded in a 3-fold cross validation
# keep 30% of features based on their AUC score
task = tsk("spam")
gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
po("learner", lrn("classif.rpart"))
learner = GraphLearner$new(gr)
rr = resample(task, learner, rsmp("holdout"), store_models = TRUE)
rr$learners[[1]]$model$auc$scores
Fix Factor Levels
Description
Fixes factors of type factor, ordered: Makes sure the factor levels
during prediction are the same as during training; possibly dropping empty
training factor levels before.
Note this may introduce missing values during prediction if unseen factor levels are found.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpFixFactors$new(id = "fixfactors", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"fixfactors". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected factor and ordered feature levels fixed.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
levels:: namedlistofcharacter
List of factor levels of each affectedfactorororderedfeature that will be fixed.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
droplevels::logical(1)
Whether to drop empty factor levels of the training task. DefaultTRUE
Internals
Changes factor levels of columns and attaches them with a new data.table backend and the virtual cbind() backend.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
Split Numeric Features into Equally Spaced Bins
Description
Splits numeric features into equally spaced bins.
See graphics::hist() for details.
Values that fall out of the training data range during prediction are
binned with the lowest / highest bin respectively.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpHistBin$new(id = "histbin", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"histbin". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their binned versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
breaks::list
List of intervals representing the bins for each numeric feature.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
breaks::character(1)|numeric|function
Either acharacter(1)string naming an algorithm to compute the number of cells, anumeric(1)giving the number of breaks for the histogram, a vectornumericgiving the breakpoints between the histogram cells, or afunctionto compute the vector of breakpoints or to compute the number of cells. Default is algorithm"Sturges"(seegrDevices::nclass.Sturges()). For details seehist().
Internals
Uses the graphics::hist function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("histbin")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Independent Component Analysis
Description
Extracts statistically independent components from data. Only affects numerical features. See fastICA::fastICA for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpICA$new(id = "ica", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"ica". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters replaced by independent components.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the function fastICA::fastICA(),
with the exception of the $X and $S slots. These are in particular:
-
K::matrix
Matrix that projects data onto the firstn.compprincipal components. SeefastICA(). -
W::matrix
Estimated un-mixing matrix. SeefastICA(). -
A::matrix
Estimated mixing matrix. SeefastICA(). -
center::numeric
The mean of each numeric feature during training.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as the following parameters
based on fastICA():
-
n.comp::numeric(1)
Number of components to extract. Default isNULL, which sets it to the number of available numeric columns. -
alg.typ::character(1)
Algorithm type. One of "parallel" (default) or "deflation". -
fun::character(1)
One of "logcosh" (default) or "exp". -
alpha::numeric(1)
In range[1, 2], Used for negentropy calculation whenfunis "logcosh". Default is 1.0. -
method::character(1)
Internal calculation method. "C" (default) or "R". SeefastICA(). -
row.norm::logical(1)
Logical value indicating whether rows should be standardized beforehand. Default isFALSE. -
maxit::numeric(1)
Maximum number of iterations. Default is 200. -
tol::numeric(1)
Tolerance for convergence, default is1e-4. -
verboselogical(1)
Logical value indicating the level of output during the run of the algorithm. Default isFALSE. -
w.init::matrix
Initial un-mixing matrix. SeefastICA(). Default isNULL.
Internals
Uses the fastICA() function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("ica")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Impute Features by a Constant
Description
Impute features by a constant value.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeConstant$new(id = "imputeconstant", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputeconstant". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features missing values imputed by
the value of the constant parameter.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model contains the value of the constant parameter that is used for imputation.
Parameters
The parameters are the parameters inherited from PipeOpImpute, as well as:
-
constant::atomic(1)
The constant value that should be used for the imputation, atomic vector of length1. The atomic mode must match the type of the features that will be selected by theaffect_columnsparameter and this will be checked during imputation. This is a required hyperparameter and needs to be set by the user. -
check_levels::logical(1)
Should be checked whether theconstantvalue is a valid level of factorial features (i.e., it already is a level)? Raises an error if unsuccessful. This check is only performed for factorial features (i.e.,factor,ordered; skipped forcharacter). Initialized toTRUE.
Note that empty factor levels can be a problem for manyLearners. Thus,PipeOpImputeOORis the preferred choice for creating new levels, since it is designed to impute out-of-range values and offers a more explicit control for handling potentially problematic behavior.
Internals
The constructor is called with empty_level_control set to "always", to allow the creation of a new empty level
for factor and ordered (but not character) features during training, if constant is not an already existing
level and check_levels is set to FALSE. This has no impact if check_levels is TRUE, since in that case an
error would be raised before imputation.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
# impute missing values of the numeric feature "glucose" by the constant value -999
po = po("imputeconstant", param_vals = list(
constant = -999, affect_columns = selector_name("glucose"))
)
new_task = po$train(list(task = task))[[1]]
new_task$missings()
new_task$data(cols = "glucose")[[1]]
Impute Numerical Features by Histogram
Description
Impute numerical features by histogram.
During training, a histogram is fitted on each column using R's hist() function.
The fitted histogram is then sampled from for imputation. Sampling happens in a two-step process:
First, a bin is sampled from the histogram, then a value is sampled uniformly from the bin.
This is an approximation to sampling from the empirical training data distribution (i.e. sampling
from training data with replacement), but is much more memory efficient for large datasets, since the $state
does not need to save the training data.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeHist$new(id = "imputehist", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputehist". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric features missing values imputed by (column-wise) histogram; see Description for details.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of lists containing elements $counts and $breaks.
Parameters
The parameters are the parameters inherited from PipeOpImpute.
Internals
Uses the graphics::hist() function. Features that are entirely NA are imputed as 0.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputehist")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
po$state$model
Impute Features by Fitting a Learner
Description
Impute features by fitting a Learner for each feature.
Uses the features indicated by the context_columns parameter as features to train the imputation Learner.
Note this parameter is part of the PipeOpImpute base class and explained there.
Additionally, only features supported by the learner can be imputed; i.e. learners of type
regr can only impute features of type integer and numeric, while classif can impute
features of type factor, ordered and logical.
The Learner used for imputation is trained on all context_columns; if these contain missing values,
the Learner typically either needs to be able to handle missing values itself, or needs to do its
own imputation (see examples).
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeLearner$new(learner, id = NULL, param_vals = list())
-
id::character(1)
Identifier of resulting object, default"impute.", followed by theidof theLearner. -
learner::Learner|character(1)Learnerto wrap, or a string identifying aLearnerin themlr3::mlr_learnersDictionary. TheLearnerusually needs to be able to handle missing values, i.e. have themissingsproperty, unless care is taken thatcontext_columnsdo not contain missings; see examples.
This argument is always cloned; to access theLearnerinsidePipeOpImputeLearnerby-reference, use$learner.
-
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with missing values from all affected features imputed by the trained model.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$models is a named list of models created by the Learner's $.train() function
for each column. If a column consists of missing values only during training, the model is 0 or the levels of the
feature; these are used for sampling during prediction.
This state is given the class "pipeop_impute_learner_state".
Parameters
The parameters are the parameters inherited from PipeOpImpute, in addition to the parameters of the Learner
used for imputation.
Internals
Uses the $train and $predict functions of the provided learner. Features that are entirely NA are imputed as 0
or randomly sampled from available (factor / logical) levels.
The Learner does not necessarily need to handle missing values in cases
where context_columns is chosen well (or there is only one column with missing values present).
Fields
Fields inherited from PipeOpTaskPreproc/PipeOp, as well as:
-
learner::Learner
Learnerthat is being wrapped. Read-only. -
learner_models::listofLearner|NULL
Learnerthat is being wrapped. This list is named by features for which aLearnerwas fitted, and contains the sameLearner, but with different respective models for each feature. If thisPipeOpis not trained, this is an emptylist. For features that were entirelyNAduring training, thelistcontainsNULLelements.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputelearner", lrn("regr.rpart"))
new_task = po$train(list(task = task))[[1]]
new_task$missings()
# '$state' of the "regr.rpart" Learner, trained to predict the 'mass' column:
po$state$model$mass
library("mlr3learners")
# To use the "regr.lm" Learner, prefix it with its own imputation method!
# The "imputehist" PipeOp is used to train "regr.lm"; predictions of this
# trained Learner are then used to impute the missing values in the Task.
po = po("imputelearner",
po("imputehist") %>>% lrn("regr.lm")
)
new_task = po$train(list(task = task))[[1]]
new_task$missings()
Impute Numerical Features by their Mean
Description
Impute numerical features by their mean.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeMean$new(id = "imputemean", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputemean". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric features missing values imputed by (column-wise) mean.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of numeric(1) indicating the mean of the respective feature.
Parameters
The parameters are the parameters inherited from PipeOpImpute.
Internals
Uses the mean() function. Features that are entirely NA are imputed as 0.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputemean")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
po$state$model
Impute Numerical Features by their Median
Description
Impute numerical features by their median.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeMedian$new(id = "imputemedian", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputemedian". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric features missing values imputed by (column-wise) median.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of numeric(1) indicating the median of the respective feature.
Parameters
The parameters are the parameters inherited from PipeOpImpute.
Internals
Uses the stats::median() function. Features that are entirely NA are imputed as 0.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputemedian")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
po$state$model
Impute Features by their Mode
Description
Impute features by their mode. Supports factors as well as logical and numerical features. If multiple modes are present then imputed values are sampled randomly from them.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeMode$new(id = "imputemode", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputemode". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features missing values imputed by (column-wise) mode.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of a vector of length one of the type of the feature, indicating the mode of the respective feature.
Parameters
The parameters are the parameters inherited from PipeOpImpute.
Internals
Features that are entirely NA are imputed as
the following: For factor or ordered, random levels are sampled uniformly at random.
For logicals, TRUE or FALSE are sampled uniformly at random.
Numerics and integers are imputed as 0.
Note that every random imputation is drawn independently, so different values may be imputed if multiple values are missing.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputemode")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
po$state$model
Out of Range Imputation
Description
Impute factorial features by adding a new level ".MISSING".
Impute numerical features by constant values shifted below the minimum or above the maximum by
using min(x) - offset - multiplier * diff(range(x)) or
max(x) + offset + multiplier * diff(range(x)).
This type of imputation is especially sensible in the context of tree-based methods, see also Ding & Simonoff (2010).
Learners expect input Tasks to have the same factor (or ordered) levels during
training as well as prediction. This PipeOp modifies the levels of factor and ordered features,
and since it may occur that a factor or ordered feature contains missing values only during prediction, but not
during training, the output Task could also have different levels during the two stages.
To avoid problems with the Learners' expectation, controlling the PipeOps' handling of this edge-case is necessary.
For this, use the create_empty_level hyperparameter inherited from PipeOpImpute.
If create_empty_level is set to TRUE, then an unseen level ".MISSING" is added to the feature during
training and missing values are imputed as ".MISSING" during prediction.
However, empty factor levels during training can be a problem for many Learners.
If create_empty_level is set to FALSE, then no empty level is introduced during training, but columns that
have missing values only during prediction will not be imputed. This is why it may still be necessary to use
po("imputesample", affect_columns = selector_type(types = c("factor", "ordered")))
(or another imputation method) after this imputation method.
Note that setting create_empty_level to FALSE is the same as setting it to TRUE and using PipeOpFixFactors
after this PipeOp.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputeoor". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features having missing values imputed as described above.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model contains either ".MISSING" used for character and factor (also
ordered) features or numeric(1) indicating the constant value used for imputation of
integer and numeric features.
Parameters
The parameters are the parameters inherited from PipeOpImpute, as well as:
-
min::logical(1)
Shouldintegerandnumericfeatures be shifted below the minimum? Initialized toTRUE. IfFALSEthey are shifted above the maximum. See also the description above. -
offset::numeric(1)
Numerical non-negative offset as used in the description above forintegerandnumericfeatures. Initialized to1. -
multiplier::numeric(1)
Numerical non-negative multiplier as used in the description above forintegerandnumericfeatures. Initialized to1.
Internals
Adds an explicit new level() to factor and ordered features, but not to character features.
For integer and numeric features uses the min, max, diff and range functions.
integer and numeric features that are entirely NA are imputed as 0. factor and ordered features that are
entirely NA are imputed as ".MISSING".
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
References
Ding Y, Simonoff JS (2010). “An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data.” Journal of Machine Learning Research, 11(6), 131-170. https://jmlr.org/papers/v11/ding10a.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample
Examples
library("mlr3")
set.seed(2409)
data = tsk("pima")$data()
data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA))
data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE)))
task = TaskClassif$new("task", backend = data, target = "diabetes")
task$missings()
po = po("imputeoor")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
new_task$data()
# recommended use when missing values are expected during prediction on
# factor columns that had no missing values during training
gr = po("imputeoor", create_empty_level = FALSE) %>>%
po("imputesample", affect_columns = selector_type(types = c("factor", "ordered")))
t1 = as_task_classif(data.frame(l = as.ordered(letters[1:3]), t = letters[1:3]), target = "t")
t2 = as_task_classif(data.frame(l = as.ordered(c("a", NA, NA)), t = letters[1:3]), target = "t")
gr$train(t1)[[1]]$data()
# missing values during prediction are sampled randomly
gr$predict(t2)[[1]]$data()
Impute Features by Sampling
Description
Impute features by sampling from non-missing training data.
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeSample$new(id = "imputesample", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputesample". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric features missing values imputed by values sampled (column-wise) from training data.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of training data with missings removed.
Parameters
The parameters are the parameters inherited from PipeOpImpute.
Internals
Uses the sample() function. Features that are entirely NA are imputed as
the following: For factor or ordered, random levels are sampled uniformly at random.
For logicals, TRUE or FALSE are sampled uniformly at random.
Numerics and integers are imputed as 0.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor
Examples
library("mlr3")
task = tsk("pima")
task$missings()
po = po("imputesample")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
Kernelized Principal Component Analysis
Description
Extracts kernel principal components from data. Only affects numerical features. See kernlab::kpca for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpKernelPCA$new(id = "kernelpca", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"kernelpca". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters replaced by their principal components.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the returned S4 object of the function kernlab::kpca().
The @rotated slot of the "kpca" object is overwritten with an empty matrix for memory efficiency.
The slots of the S4 object can be accessed by accessor function. See kernlab::kpca.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
kernel::character(1)
The standard deviations of the principal components. Seekpca(). -
kpar::list
List of hyper-parameters that are used with the kernel function. Seekpca(). -
features::numeric(1)
Number of principal components to return. Default 0 means that all principal components are returned. Seekpca(). -
th::numeric(1)
The value of eigenvalue under which principal components are ignored. Default is 0.0001. Seekpca(). -
na.action::function
Function to specify NA action. Default isna.omit. Seekpca().
Internals
Uses the kpca() function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("kernelpca", features = 3) # only keep top 3 components
task$data()
pop$train(list(task))[[1]]$data()
Wrap a Learner into a PipeOp
Description
Wraps an mlr3::Learner into a PipeOp.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
Using PipeOpLearner, it is possible to embed mlr3::Learners into Graphs, which themselves can be
turned into Learners using GraphLearner. This way, preprocessing and ensemble methods can be included
into a machine learning pipeline which then can be handled as singular object for resampling, benchmarking
and tuning.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpLearner$new(learner, id = NULL, param_vals = list())
-
learner::Learner|character(1)
Learnerto wrap, or a string identifying aLearnerin themlr3::mlr_learnersDictionary. This argument is always cloned; to access theLearnerinsidePipeOpLearnerby-reference, use$learner.
-
id::character(1)
Identifier of the resulting object, internally defaulting to theidof theLearnerbeing wrapped. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpLearner has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearner has one output channel named "output", producing NULL during training and a Prediction subclass
during prediction; this subclass is specific to the Learner type given to learner during construction.
The output during prediction is the Prediction on the prediction input data, produced by the Learner
trained on the training input data.
State
The $state is set to the $state slot of the Learner object. It is a named list with members:
-
model::any
Model created by theLearner's$.train()function. -
train_log::data.tablewith columnsclass(character),msg(character)
Errors logged during training. -
train_time::numeric(1)
Training time, in seconds. -
predict_log::NULL|data.tablewith columnsclass(character),msg(character)
Errors logged during prediction. -
predict_time::NULL|numeric(1)Prediction time, in seconds.
Parameters
The parameters are exactly the parameters of the Learner wrapped by this object.
Internals
The $state is currently not updated by prediction, so the $state$predict_log and $state$predict_time will always be NULL.
Fields
Fields inherited from PipeOp, as well as:
-
learner::Learner
Learnerthat is being wrapped. Read-only. -
learner_model::Learner
Learnerthat is being wrapped. This learner contains the model if thePipeOpis trained. Read-only. -
validate::"predefined"orNULL
This field can only be set forLearners that have the"validation"property. Setting the field to"predefined"means that the wrappedLearnerwill use the internal validation task, otherwise it will be ignored. Note that specifying how the validation data is created is possible via the$validatefield of theGraphLearner. For eachPipeOpit is then only possible to either use it ("predefined") or not use it (NULL). Also seeset_validate.GraphLearnerfor more information. -
internal_tuned_values:: namedlist()orNULL
The internally tuned values if the wrappedLearnersupports internal tuning,NULLotherwise. -
internal_valid_scores:: namedlist()orNULL
The internal validation scores if the wrappedLearnersupports internal validation,NULLotherwise.
Methods
Methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner_cv,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles
Examples
library("mlr3")
task = tsk("iris")
learner = lrn("classif.rpart", cp = 0.1)
lrn_po = mlr_pipeops$get("learner", learner)
lrn_po$train(list(task))
lrn_po$predict(list(task))
Wrap a Learner into a PipeOp with Cross-validated Predictions as Features
Description
Wraps an mlr3::Learner into a PipeOp.
Returns cross-validated predictions during training as a Task and stores a model of the
Learner trained on the whole data in $state. This is used to create a similar
Task during prediction.
The Task gets features depending on the capsuled Learner's
$predict_type. If the Learner's $predict.type is "response", a feature <ID>.response is created,
for $predict.type "prob" the <ID>.prob.<CLASS> features are created, and for $predict.type "se" the new columns
are <ID>.response and <ID>.se. <ID> denotes the $id of the PipeOpLearnerCV object.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
PipeOpLearnerCV can be used to create "stacking" or "super learning" Graphs that use the output of one Learner
as feature for another Learner. Because the PipeOpLearnerCV erases the original input features, it is often
useful to use PipeOpFeatureUnion to bind the prediction Task to the original input Task.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpLearnerCV$new(learner, id = NULL, param_vals = list())
-
learner::Learner
Learnerto use for cross validation / prediction, or a string identifying aLearnerin themlr3::mlr_learnersDictionary. This argument is always cloned; to access theLearnerinsidePipeOpLearnerCVby-reference, use$learner.
-
id::character(1)Identifier of the resulting object, internally defaulting to theidof theLearnerbeing wrapped. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpLearnerCV has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerCV has one output channel named "output", producing a Task specific to the Learner
type given to learner during construction; both during training and prediction.
The output is a task with the same target as the input task, with features replaced by predictions made by the Learner.
During training, this prediction is the out-of-sample prediction made by resample, during prediction, this is the
ordinary prediction made on the data by a Learner trained on the training phase data.
State
The $state is set to the $state slot of the Learner object, together with the $state elements inherited from the
PipeOpTaskPreproc. It is a named list with the inherited members, as well as:
-
model::any
Model created by theLearner's$.train()function. -
train_log::data.tablewith columnsclass(character),msg(character)
Errors logged during training. -
train_time::numeric(1)
Training time, in seconds. -
predict_log::NULL|data.tablewith columnsclass(character),msg(character)
Errors logged during prediction. -
predict_time::NULL|numeric(1)Prediction time, in seconds.
This state is given the class "pipeop_learner_cv_state".
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Learner wrapped by this object.
Besides that, parameters introduced are:
-
resampling.method::character(1)
Which resampling method do we want to use. Currently only supports"cv"and"insample"."insample"generates predictions with the model trained on all training data. -
resampling.folds::numeric(1)
Number of cross validation folds. Initialized to 3. Only used forresampling.method = "cv". -
keep_response::logical(1)
Only effective during"prob"prediction: Whether to keep response values, if available. Initialized toFALSE.
Internals
The $state is currently not updated by prediction, so the $state$predict_log and $state$predict_time will always be NULL.
Fields
Fields inherited from PipeOp, as well as:
-
learner::Learner
Learnerthat is being wrapped. Read-only. -
learner_model::Learner
Learnerthat is being wrapped. This learner contains the model if thePipeOpis trained. Read-only.
Methods
Methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles
Examples
library("mlr3")
task = tsk("iris")
learner = lrn("classif.rpart")
lrncv_po = po("learner_cv", learner)
lrncv_po$learner$predict_type = "response"
nop = mlr_pipeops$get("nop")
graph = gunion(list(
lrncv_po,
nop
)) %>>% po("featureunion")
graph$train(task)
graph$pipeops$classif.rpart$learner$predict_type = "prob"
graph$train(task)
Wrap a Learner into a PipeOp with Cross-validation Plus Confidence Intervals as Predictions
Description
Wraps an mlr3::Learner into a PipeOp.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
Using PipeOpLearnerPICVPlus, it is possible to embed a mlr3::Learner into a Graph.
PipeOpLearnerPICVPlus can then be used to perform cross validation plus (or jackknife plus).
During training, PipeOpLearnerPICVPlus performs cross validation on the training data.
During prediction, the models from the training stage are used to construct predictive confidence intervals for the prediction data based on
out-of-fold residuals and out-of-fold predictions.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpLearnerPICVPlus$new(learner, id = NULL, param_vals = list())
-
learner::LearnerRegrLearnerRegrto use for the cross validation models in the Cross Validation Plus method. This argument is always cloned; to access theLearnerinsidePipeOpLearnerPICVPlusby-reference, use$learner.
-
id::character(1)Identifier of the resulting object, internally defaulting to theidof theLearnerbeing wrapped. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default islist().
Input and Output Channels
PipeOpLearnerPICVPlus has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerPICVPlus has one output channel named "output", producing NULL during training and a PredictionRegr
during prediction.
The output during prediction is a PredictionRegr with predict_type quantiles on the prediction input data.
The alpha and 1 - alpha quantiles are the quantiles of the prediction interval produced by the cross validation plus method.
The response is the median of the prediction of all cross validation models on the prediction data.
State
The $state is a named list with members:
-
cv_model_states::list
List of the state of each cross validation model created by theLearner's$.train()function during resampling with method"cv". -
residuals::data.table
data.tablewith columnsfoldandresidual. Lists the Regression residuals for each observation and cross validation fold.
This state is given the class "pipeop_learner_cv_state".
Parameters
The parameters of the Learner wrapped by this object, as well as:
-
folds::numeric(1)
Number of cross validation folds. Initialized to 3. -
alpha::numeric(1)
Quantile to use for the cross validation plus prediction intervals. Initialized to 0.05.
Internals
The $state is updated during training.
Fields
Fields inherited from PipeOp, as well as:
-
learner::Learner
Learnerthat is being wrapped. Read-only. -
learner_model::Learnerorlist
If thePipeOpLearnerPICVPlushas been trained, this is alistcontaining theLearners of the cross validation models. Otherwise, this contains theLearnerthat is being wrapped. Read-only. -
predict_type
Predict type of thePipeOpLearnerPICVPlus, which is always"response" "quantiles". This can be different to the predict type of theLearnerthat is being wrapped.
Methods
Methods inherited from PipeOp.
References
Barber RF, Candes EJ, Ramdasa A, Tibshirani RJ (2021). “Predictive inference with the jackknife+.” Annals of Statistics, 49, 486–507. doi:10.1214/20-AOS1965.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_cv,
mlr_pipeops_learner_quantiles
Examples
library("mlr3")
task = tsk("mtcars")
learner = lrn("regr.rpart")
lrncvplus_po = mlr_pipeops$get("learner_pi_cvplus", learner)
lrncvplus_po$train(list(task))
lrncvplus_po$predict(list(task))
Wrap a Learner into a PipeOp to to predict multiple Quantiles
Description
Wraps a LearnerRegr into a PipeOp to predict multiple quantiles.
PipeOpLearnerQuantiles only supports LearnerRegrs that have quantiles as a possible pedict_type.
It produces quantile-based predictions for multiple quantiles in one PredictionRegr. This is especially helpful if the LearnerRegr can only predict one quantile (like for example LearnerRegrGBM in mlr3extralearners)
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpLearnerQuantiles$new(learner, id = NULL, param_vals = list())
-
learner::Learner|character(1)
Learnerto wrap, or a string identifying aLearnerin themlr3::mlr_learnersDictionary. TheLearnerhas to be aLearnerRegrwithpredict_type"quantiles". This argument is always cloned; to access theLearnerinsidePipeOpLearnerQuantilesby-reference, use$learner. -
id::character(1)Identifier of the resulting object, internally defaulting to theidof theLearnerbeing wrapped. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpLearnerQuantiles has one input channel named "input", taking a TaskRegr specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerQuantiles has one output channel named "output", producing NULL during training and a PredictionRegr object
during prediction.
The output during prediction is a PredictionRegr on the prediction input data that aggregates all results produced by the Learner for each quantile in quantiles.
trained on the training input data.
State
The $state is set during training. It is a named list with the member:
-
model_states::list
List of the states of all models created by theLearner's$.train()function.
Parameters
The parameters are exactly the parameters of the Learner wrapped by this object.
-
q_vals::numeric
Quantiles to use for training and prediction. Initialized toc(0.05, 0.5, 0.95) -
q_response::numeric(1)
Which quantile inquantilesto use as aresponsefor thePredictionRegrduring prediction. Initialized to0.5.
Internals
The $state is updated during training.
Fields
Fields inherited from PipeOp, as well as:
-
learner::LearnerRegr
Learnerthat is being wrapped. Read-only. -
learner_model::Learner
IfPipeOpLearnerQuantileshas been trained, this is alistcontaining theLearners for each quantile. Otherwise, this contains theLearnerthat is being wrapped. Read-only. -
predict_type::character(1)
Predict type of thePipeOpLearnerQuantiles, which is always"response" "quantiles".
Methods
Methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_cv,
mlr_pipeops_learner_pi_cvplus
Examples
library("mlr3")
task = tsk("boston_housing")
learner = lrn("regr.debug")
po = mlr_pipeops$get("learner_quantiles", learner)
po$train(list(task))
po$predict(list(task))
Add Missing Indicator Columns
Description
Add missing indicator columns ("dummy columns") to the Task.
Drops original features; should probably be used in combination with PipeOpFeatureUnion and imputation PipeOps (see examples).
Note the affect_columns is initialized with selector_invert(selector_type(c("factor", "ordered", "character"))), since missing
values in factorial columns are often indicated by out-of-range imputation (PipeOpImputeOOR).
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpMissInd$new(id = "missind", param_vals = list())
-
id::character(1)Identifier of the resulting object, defaulting to"missind". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
State
$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
indicand_cols::character
Names of columns for which indicator columns are added. If thewhichparameter is"all", this is just the names of all features, otherwise it is the names of all features that had missing values during training.
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:
-
which::character(1)
Determines for which features the indicator columns are added. Can either be"missing_train"(default), adding indicator columns for each feature that actually has missing values, or"all", adding indicator columns for all features. -
type::character(1)
Determines the type of the newly created columns. Can be one of"factor"(default),"integer","logical","numeric".
Internals
This PipeOp should cover most cases where "dummy columns" or "missing indicators" are desired. Some edge cases:
If imputation for factorial features is performed and only numeric features should gain missing indicators, the
affect_columnsparameter can be set toselector_type("numeric").If missing indicators should only be added for features that have more than a fraction of
xmissing values, thePipeOpRemoveConstantscan be used withaffect_columns = selector_grep("^missing_")andratio = x.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreprocSimple(PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("pima")$select(c("insulin", "triceps"))
sum(complete.cases(task$data()))
task$missings()
tail(task$data())
po = po("missind")
new_task = po$train(list(task))[[1]]
tail(new_task$data())
# proper imputation + missing indicators
impgraph = list(
po("imputesample"),
po("missind")
) %>>% po("featureunion")
tail(impgraph$train(task)[[1]]$data())
Transform Columns by Constructing a Model Matrix
Description
Transforms columns using a given formula using the stats::model.matrix() function.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpModelMatrix$new(id = "modelmatrix", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"modelmatrix". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with transformed columns according to the used formula.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
formula::formula
Formula to use. Higher order interactions can be created using constructs like~. ^ 2. By default, an(Intercept)column of all1s is created, which can be avoided by adding0 +to the term. Seemodel.matrix().
Internals
Uses the model.matrix() function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("modelmatrix", formula = ~ . ^ 2)
task$data()
pop$train(list(task))[[1]]$data()
pop$param_set$values$formula = ~ 0 + . ^ 2
pop$train(list(task))[[1]]$data()
Explicate a Multiplicity
Description
Explicate a Multiplicity by turning the input Multiplicity into multiple outputs.
This PipeOp has multiple output channels; the members of the input Multiplicity
are forwarded each along a single edge. Therefore, only multiplicities with exactly as many
members as outnum are accepted.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpMultiplicityExply$new(outnum , id = "multiplicityexply", param_vals = list())
-
outnum::numeric(1)|character
Determines the number of output channels. -
id::character(1)
Identifier of the resulting object, default"multiplicityexply". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpMultiplicityExply has a single input channel named "input", collecting a
Multiplicity of type any ("[*]") both during training and prediction.
PipeOpMultiplicityExply has multiple output channels depending on the outnum construction
argument, named "output1", "output2" returning the elements of the unclassed input
Multiplicity.
State
The $state is left empty (list()).
Parameters
PipeOpMultiplicityExply has no Parameters.
Internals
outnum should match the number of elements of the unclassed input Multiplicity.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Examples
library("mlr3")
task1 = tsk("iris")
task2 = tsk("mtcars")
po = po("multiplicityexply", outnum = 2)
po$train(list(Multiplicity(task1, task2)))
po$predict(list(Multiplicity(task1, task2)))
Implicate a Multiplicity
Description
Implicate a Multiplicity by returning the input(s) converted to a Multiplicity.
This PipeOp has multiple input channels; all inputs are collected into a Multiplicity
and then are forwarded along a single edge, causing the following PipeOps to be called
multiple times, once for each Multiplicity member.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpMultiplicityImply$new(innum = 0, id = "multiplicityimply", param_vals = list())
-
innum::numeric(1)|character
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. Ifinnumis acharactervector, the number of input channels is the length ofinnum. -
id::character(1)
Identifier of the resulting object, default"multiplicityimply". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpMultiplicityImply has multiple input channels depending on the innum construction
argument, named "input1", "input2", ... if innum is nonzero; if innum is 0, there is
only one vararg input channel named "...". All input channels take any input ("*") both
during training and prediction.
PipeOpMultiplicityImply has one output channel named "output", emitting a Multiplicity
of type any ("[*]"), i.e., returning the input(s) converted to a Multiplicity both during
training and prediction.
State
The $state is left empty (list()).
Parameters
PipeOpMultiplicityImply has no Parameters.
Internals
If innum is not numeric, e.g., a character, the output Multiplicity will be named based
on the input channel names
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Examples
library("mlr3")
task1 = tsk("iris")
task2 = tsk("mtcars")
po = po("multiplicityimply")
po$train(list(task1, task2))
po$predict(list(task1, task2))
Add Features According to Expressions
Description
Adds features according to expressions given as formulas that may depend on values of other features. This can add new features, or can change existing features.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpMutate$new(id = "mutate", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"mutate". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with added and/or mutated features according to the mutation parameter.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
mutation:: namedlistofformula
Expressions for new features to create (or present features to change), in the form offormula. Each element of the list is aformulawith the name of the element naming the feature to create or change, and the formula expression determining the result. This expression may reference other features, as well as variables visible at the creation of theformula(see examples). Initialized tolist(). -
delete_originals::logical(1)
Whether to delete original features. Even when this isFALSE, present features may still be overwritten. Initialized toFALSE.
Internals
A formula created using the ~ operator always contains a reference to the environment in which
the formula is created. This makes it possible to use variables in the ~-expressions that both
reference either column names or variable names.
Note that the formulas in mutation are evaluated sequentially. This allows for using
variables that were constructed during evaluation of a previous formula. However, if existing
features are changed, precedence is given to the original ones before the newly constructed ones.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
constant = 1
pom = po("mutate")
pom$param_set$values$mutation = list(
Sepal.Length_plus_constant = ~ Sepal.Length + constant,
Sepal.Area = ~ Sepal.Width * Sepal.Length,
Petal.Area = ~ Petal.Width * Petal.Length,
Sepal.Area_plus_Petal.Area = ~ Sepal.Area + Petal.Area
)
pom$train(list(tsk("iris")))[[1]]$data()
Nearmiss Down-Sampling
Description
Generates a more balanced data set by down-sampling the instances of non-minority classes using the NEARMISS algorithm.
The algorithm down-samples by selecting instances from the non-minority classes that have the smallest mean distance
to their k nearest neighbors of different classes.
For this only numeric and integer features are taken into account. These must have no missing values.
This can only be applied to classification tasks. Multiclass classification is supported.
See themis::nearmiss for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpNearmiss$new(id = "nearmiss", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"nearmiss". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with the rows removed from the non-minority classes.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as
-
k::integer(1)
Number of nearest neighbors used for calculating the mean distances. Default is5. -
under_ratio::numeric(1)
Ratio of the minority-to-majority frequencies. This specifies the ratio to which the number of instances in the non-minority classes get down-sampled to, relative to the number of instances of the minority class. Default is1. For details, seethemis::nearmiss.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
Zhang, J., Mani, I. (2003). “KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction.” In Proceedings of Workshop on Learning from Imbalanced Datasets (ICML).
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
task = tsk("wine")
task$head()
table(task$data(cols = "type"))
# Down-sample and balance data
pop = po("nearmiss")
nearmiss_result = pop$train(list(task))[[1]]$data()
nrow(nearmiss_result)
table(nearmiss_result$type)
Non-negative Matrix Factorization
Description
Extracts non-negative components from data by performing non-negative matrix factorization. Only
affects non-negative numerical features. See nmf() for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpNMF$new(id = "nmf", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"nmf". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their
non-negative components.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the elements of the object returned by nmf().
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
rank::integer(1)
Factorization rank, i.e., number of components. Initialized to2. Seenmf(). -
method::character(1)
Specification of the NMF algorithm. Initialized to"brunet". Seenmf(). -
seed::character(1)|integer(1)|list()| object of classNMF|function()
Specification of the starting point. Seenmf(). -
nrun::integer(1)
Number of runs to performs. Default is1. More than a single run allows for the computation of a consensus matrix which will also be stored in the$state. Seenmf(). -
debug::logical(1)
Whether to toggle debug mode. Default isFALSE. Seenmf(). -
keep.all::logical(1)
Whether all factorizations are to be saved and returned. Default isFALSE. Only has an effect ifnrun > 1. Seenmf(). -
parallel::character(1)|integer(1)|logical(1)
Specification of parallel handling ifnrun > 1. Initialized toFALSE, as it is recommended to usemlr3'sfuture-based parallelization. Seenmf(). -
parallel.required::character(1)|integer(1)|logical(1)
Same asparallel, but an error is thrown if the computation cannot be performed in parallel or with the specified number of processors. Initialized toFALSE, as it is recommended to usemlr3'sfuture-based parallelization. Seenmf(). -
shared.memory::logical(1)
Whether shared memory should be enabled. Seenmf(). -
simplifyCB::logical(1)
Whether callback results should be simplified. Default isTRUE. Seenmf(). -
track::logical(1)
Whether error tracking should be enabled. Default isFALSE. Seenmf(). -
verbose::integer(1)|logical(1)
Specification of verbosity. Default isFALSE. Seenmf(). -
pbackend::character(1)|integer(1)|NULL
Specification of the parallel backend. It is recommended to usemlr3'sfuture-based parallelization. Seenmf(). -
callback|function()
Callback function that is called after each run (ifnrun > 1). Seenmf().
Internals
Uses the nmf() function as well as basis(), coef() and
ginv().
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("nmf")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Simply Push Input Forward
Description
Simply pushes the input forward.
Can be useful during Graph construction using the %>>%-operator to specify which PipeOp gets connected to which.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpNOP$new(id = "nop", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"nop". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpNOP has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpNOP has one output channel named "output", producing the object given as input ("*") without changes.
State
The $state is left empty (list()).
Parameters
PipeOpNOP has no parameters.
Internals
PipeOpNOP is a useful "default" stand-in for a PipeOp/Graph that does nothing.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Placeholder Pipeops:
mlr_pipeops_copy
Examples
library("mlr3")
nop = po("nop")
nop$train(list(1))
# use `gunion` and `%>>%` to create a "bypass"
# next to "pca"
gr = gunion(list(
po("pca"),
nop
)) %>>% po("featureunion")
gr$train(tsk("iris"))[[1]]$data()
Split a Classification Task into Binary Classification Tasks
Description
Splits a classification Task into several binary classification Tasks to perform "One vs. Rest" classification. This works in combination
with PipeOpOVRUnite.
For each target level a new binary classification Task is constructed with
the respective target level being the positive class and all other target levels being the
new negative class "rest".
This PipeOp creates a Multiplicity, which means that subsequent PipeOps are executed
multiple times, once for each created binary Task, until a PipeOpOVRUnite
is reached.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
Format
R6Class inheriting from PipeOp.
Construction
PipeOpOVRSplit$new(id = "ovrsplit", param_vals = list())
-
id::character(1)
Identifier of the resulting object, default"ovrsplit". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpOVRSplit has one input channel named "input" taking a TaskClassif
both during training and prediction.
PipeOpOVRSplit has one output channel named "output" returning a Multiplicity of
TaskClassifs both during training and prediction, i.e., the newly
constructed binary classification Tasks.
State
The $state contains the original target levels of the TaskClassif supplied
during training.
Parameters
PipeOpOVRSplit has no parameters.
Internals
The original target levels stored in the $state are also used during prediction when creating the new
binary classification Tasks.
The names of the element of the output Multiplicity are given by the levels of the target.
If a target level "rest" is present in the input TaskClassif, the
negative class will be labeled as "rest." (using as many "."' postfixes needed to yield a
valid label).
Should be used in combination with PipeOpOVRUnite.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Examples
library(mlr3)
task = tsk("iris")
po = po("ovrsplit")
po$train(list(task))
po$predict(list(task))
Unite Binary Classification Tasks
Description
Perform "One vs. Rest" classification by (weighted) majority vote prediction from classification Predictions. This works in combination with PipeOpOVRSplit.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction.
Always returns a "prob" prediction, regardless of the incoming Learner's
$predict_type. The label of the class with the highest predicted probability is selected as the
"response" prediction.
Missing values during prediction are treated as each class label being equally likely.
This PipeOp uses a Multiplicity input, which is created by PipeOpOVRSplit and causes
PipeOps on the way to this PipeOp to be called once for each individual binary Task.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
Format
R6Class inheriting from PipeOpEnsemble/PipeOp.
Construction
PipeOpOVRUnite$new(id = "ovrunite", param_vals = list())
-
id::character(1)
Identifier of the resulting object, default"ovrunite". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpEnsemble. Instead of a
Prediction, a PredictionClassif is used as
input and output during prediction and PipeOpEnsemble's collect parameter is initialized
with TRUE to allow for collecting a Multiplicity input.
State
The $state is left empty (list()).
Parameters
The parameters are the parameters inherited from the PipeOpEnsemble.
Internals
Inherits from PipeOpEnsemble by implementing the private$.predict() method.
Should be used in combination with PipeOpOVRSplit.
Fields
Only fields inherited from PipeOpEnsemble/PipeOp.
Methods
Only methods inherited from PipeOpEnsemble/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_regravg
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_replicate
Examples
library(mlr3)
task = tsk("iris")
gr = po("ovrsplit") %>>% lrn("classif.rpart") %>>% po("ovrunite")
gr$train(task)
gr$predict(task)
gr$pipeops$classif.rpart$learner$predict_type = "prob"
gr$predict(task)
Principle Component Analysis
Description
Extracts principle components from data. Only affects numerical features.
See stats::prcomp() for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpPCA$new(id = "pca", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"pca". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their principal components.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the class stats::prcomp,
with the exception of the $x slot. These are in particular:
-
sdev::numeric
The standard deviations of the principal components. -
rotation::matrix
The matrix of variable loadings. -
center::numeric|logical(1)
The centering used, orFALSE. -
scale::numeric|logical(1)
The scaling used, orFALSE.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
center::logical(1)
Indicating whether the features should be centered. Default isTRUE. Seeprcomp(). -
scale.::logical(1)
Whether to scale features to unit variance before analysis. Default isFALSE, but scaling is advisable. Seeprcomp(). -
rank.::integer(1)
Maximal number of principal components to be used. Default isNULL: use all components. Seeprcomp().
Internals
Uses the prcomp() function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("pca")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Wrap another PipeOp or Graph as a Hyperparameter
Description
Wraps another PipeOp or Graph as determined by the content hyperparameter.
Input is routed through the content and the contents' output is returned.
The content hyperparameter can be changed during tuning, this is useful as an alternative to PipeOpBranch.
Format
Abstract R6Class inheriting from PipeOp.
Construction
PipeOpProxy$new(innum = 0, outnum = 1, id = "proxy", param_vals = list())
-
innum::numeric(1)\cr Determines the number of input channels. If innum' is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. -
outnum:: 'numeric(1)
Determines the number of output channels. -
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpProxy has multiple input channels depending on the innum construction argument, named
"input1", "input2", ... if innum is nonzero; if innum is 0, there is only one vararg
input channel named "...".
PipeOpProxy has multiple output channels depending on the outnum construction argument,
named "output1", "output2", ...
The output is determined by the output of the content operation (a PipeOp or Graph).
State
The $state is the trained content PipeOp or Graph.
Parameters
-
content::PipeOp|Graph
ThePipeOporGraphthat is being proxied (or an object that is converted to aGraphbyas_graph()). Defaults to an instance ofPipeOpFeatureUnion(combines all input if they areTasks).
Internals
The content will internally be coerced to a graph via
as_graph() prior to train and predict.
The default value for content is PipeOpFeatureUnion,
Fields
Fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
set.seed(1234)
task = tsk("iris")
# use a proxy for preprocessing and a proxy for learning, i.e.,
# no preprocessing and classif.rpart
g = po("proxy", id = "preproc", param_vals = list(content = po("nop"))) %>>%
po("proxy", id = "learner", param_vals = list(content = lrn("classif.rpart")))
rr_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3))
rr_rpart$aggregate(msr("classif.ce"))
# use pca for preprocessing and classif.rpart as the learner
g$param_set$values$preproc.content = po("pca")
g$param_set$values$learner.content = lrn("classif.rpart")
rr_pca_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3))
rr_pca_rpart$aggregate(msr("classif.ce"))
Split Numeric Features into Quantile Bins
Description
Splits numeric features into quantile bins.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpQuantileBin$new(id = "quantilebin", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"quantilebin". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their binned versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
bins::list
List of intervals representing the bins for each numeric feature.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
numsplits::integer(1)
Number of bins to create. Default is2.
Internals
Uses the stats::quantile function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("quantilebin")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Project Numeric Features onto a Randomly Sampled Subspace
Description
Projects numeric features onto a randomly sampled subspace. All numeric features
(or the ones selected by affect_columns) are replaced by numeric features
PR1, PR2, ... PRn
Samples with features that contain missing values result in all PR1..PRn being
NA for that sample, so it is advised to do imputation before random projections
if missing values can be expected.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpRandomProjection$new(id = "randomprojection", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"randomprojection". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with affected numeric features
projected onto a random subspace.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as an element $projection, a matrix.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
rank::integer(1)
The dimension of the subspace to project onto. Initialized to 1.
Internals
If there are n (affected) numeric features in the input Task,
then $state$projection is a rank x m matrix. The output is calculated as
input %*% state$projection.
The random projection matrix is obtained through Gram-Schmidt orthogonalization from a matrix with values standard normally distributed, which gives a distribution that is rotation invariant, as per Eaton: Multivariate Statistics, A Vector Space Approach, Pg. 234.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("randomprojection", rank = 2)
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Generate a Randomized Response Prediction
Description
Takes in a Prediction of predict_type "prob"
(for PredictionClassif) or "se"
(for PredictionRegr) and generates a randomized "response"
prediction.
For "prob", the responses are sampled according to
the probabilities of the input PredictionClassif. For "se",
responses are randomly drawn according to the rdistfun parameter (default is rnorm) by using
the original responses of the input PredictionRegr as the mean and the
original standard errors of the input PredictionRegr as the standard
deviation (sampling is done observation-wise).
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpRandomResponse$new(id = "randomresponse", param_vals = list(), packages = character(0))
-
id::character(1)
Identifier of the resulting object, default"randomresponse". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist(). packages ::
character
Set of all required packages for theprivate$.predict()methods related to therdistfunparameter. Default ischaracter(0).
Input and Output Channels
PipeOpRandomResponse has one input channel named "input", taking NULL during training and
a Prediction during prediction.
PipeOpRandomResponse has one output channel named "output", producing NULL during
training and a Prediction with random responses during prediction.
State
The $state is left empty (list()).
Parameters
-
rdistfun::function
A function for generating random responses when the predict type is"se". This function must accept the argumentsn(integerish number of responses),mean(numericfor the mean), andsd(numericfor the standard deviation), and must vectorize overmeanandsd. Default isrnorm.
Internals
If the predict_type of the input Prediction does not match "prob" or
"se", the input Prediction will be returned unaltered.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library(mlr3)
library(mlr3learners)
task1 = tsk("iris")
g1 = LearnerClassifRpart$new() %>>% PipeOpRandomResponse$new()
g1$train(task1)
g1$pipeops$classif.rpart$learner$predict_type = "prob"
set.seed(2409)
g1$predict(task1)
task2 = tsk("mtcars")
g2 = LearnerRegrLM$new() %>>% PipeOpRandomResponse$new()
g2$train(task2)
g2$pipeops$regr.lm$learner$predict_type = "se"
set.seed(2906)
g2$predict(task2)
Weighted Prediction Averaging
Description
Perform (weighted) prediction averaging from regression Predictions by connecting
PipeOpRegrAvg to multiple PipeOpLearner outputs.
The resulting "response" prediction is a weighted average of the incoming "response" predictions.
"se" prediction is currently not aggregated but discarded if present.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction. Defaults to equal weights for each model.
Format
R6Class inheriting from PipeOpEnsemble/PipeOp.
Construction
PipeOpRegrAvg$new(innum = 0, collect_multiplicity = FALSE, id = "regravg", param_vals = list())
-
innum::numeric(1)
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. -
collect_multiplicity::logical(1)
IfTRUE, the input is aMultiplicitycollecting channel. This means, aMultiplicityinput, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnumto be 0. Default isFALSE. -
id::character(1)Identifier of the resulting object, default"regravg". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionRegr
is used as input and output during prediction.
State
The $state is left empty (list()).
Parameters
The parameters are the parameters inherited from the PipeOpEnsemble.
Internals
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpEnsemble/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite
Examples
library("mlr3")
# Simple Bagging
gr = ppl("greplicate",
po("subsample") %>>%
po("learner", lrn("classif.rpart")),
n = 5
) %>>%
po("classifavg")
resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))
Remove Constant Features
Description
Remove constant features from a mlr3::Task. For each feature, calculates the ratio of features which differ from their mode value. All features with a ratio below a settable threshold are removed from the task. Missing values can be ignored or treated as a regular value distinct from non-missing values.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpRemoveConstants$new(id = "removeconstants")
-
id::character(1)Identifier of the resulting object, defaulting to"removeconstants". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
State
$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
features::character()
Names of features that are being kept. Features of types that theFiltercan not operate on are always being kept.
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:
-
ratio::numeric(1)
Ratio of values which must be different from the mode value in order to keep a feature in the task. Initialized to 0, which means only constant features with exactly one observed level are removed. -
rel_tol::numeric(1)
Relative tolerance within which to consider a numeric feature constant. Set to 0 to disregard relative tolerance. Initialized to1e-8. -
abs_tol::numeric(1)
Absolute tolerance within which to consider a numeric feature constant. Set to 0 to disregard absolute tolerance. Initialized to1e-8. -
na_ignore::logical(1)
IfTRUE, the ratio is calculated after removing all missing values first, so a column can be "constant" even if some but not all values areNA. Initialized toTRUE.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
data = data.table::data.table(y = runif(10), a = 1:10, b = rep(1, 10), c = rep(1:2, each = 5))
task = TaskRegr$new("example", data, target = "y")
po = po("removeconstants")
po$train(list(task = task))[[1]]$data()
po$state
Rename Columns
Description
Renames the columns of a Task both during training and prediction.
Uses the $rename() mutator of the Task.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpRenameColumns$new(id = "renamecolumns", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"renamecolumns". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the old column names changed to the new ones.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
renaming:: namedcharacter
Namedcharactervector. The names of the vector specify the old column names that should be changed to the new column names as given by the elements of the vector. Initialized to the empty character vector. -
ignore_missing::logical(1)
Ignore if columns named inrenamingare not found in the inputTask. If this isFALSE, then names found inrenamingnot found in theTaskcause an error. Initialized toFALSE.
Internals
Uses the $rename() mutator of the Task to set the new column names.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("renamecolumns", param_vals = list(renaming = c("Petal.Length" = "PL")))
pop$train(list(task))
Replicate the Input as a Multiplicity
Description
Replicate the input as a Multiplicity, causing subsequent PipeOps to be executed multiple
reps times.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpReplicate$new(id = "replicate", param_vals = list())
-
id::character(1)Identifier of the resulting object, default"replicate". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpReplicate has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpReplicate has one output channel named "output" returning the replicated input as a
Multiplicity of type any ("[*]") both during training and prediction.
State
The $state is left empty (list()).
Parameters
-
reps::numeric(1)
Integer indicating the number of times the input should be replicated.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite
Examples
library("mlr3")
task = tsk("iris")
po = po("replicate", param_vals = list(reps = 3))
po$train(list(task))
po$predict(list(task))
Apply a Function to each Row of a Task
Description
Applies a function to each row of a task. Use the affect_columns parameter inherited from
PipeOpTaskPreprocSimple to limit the columns this function should be applied to.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpColApply$new(id = "rowapply", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"rowapply". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the original affected columns replaced by the columns created by
applying applicator to each row.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
applicator::function
Function to apply to each row in the affected columns of the task. The return value should be a vector of the same length for every input. Initialized asidentity(). -
col_prefix::character(1)
If specified, prefix to be prepended to the column names of affected columns, separated by a dot (.). Initialized as"".
Internals
Calls apply on the data, using the value of applicator as FUN.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pora = po("rowapply", applicator = scale)
pora$train(list(task))[[1]] # rows are standardized
Center and Scale Numeric Features
Description
Centers all numeric features to mean = 0 (if center parameter is TRUE) and scales them
by dividing them by their root-mean-square (if scale parameter is TRUE).
The root-mean-square here is defined as sqrt(sum(x^2)/(length(x)-1)). If the center parameter
is TRUE, this corresponds to the sd().
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpScale$new(id = "scale", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"scale". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters centered and/or scaled.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
center::numeric
The mean / median (depending onrobust) of each numeric feature during training, or 0 ifcenterisFALSE. Will be subtracted during the predict phase. -
scale::numeric
The value by which features are divided. 1 ifscaleisFALSE
IfrobustisFALSE, this is the root mean square, defined assqrt(sum(x^2)/(length(x)-1)), of each feature, possibly after centering. IfrobustisTRUE, this is the median absolute deviation multiplied by 1.4826 (see stats::mad) of each feature, possibly after centering. This is 1 for features that are constant during training ifcenterisTRUE, to avoid division-by-zero.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
center::logical(1)
Whether to center features, i.e. subtract theirmean()from them. DefaultTRUE. -
scale::logical(1)
Whether to scale features, i.e. divide them bysqrt(sum(x^2)/(length(x)-1)). DefaultTRUE. -
robust::logical(1)
Whether to use robust scaling; instead of scaling / centering with mean / standard deviation, median and median absolute deviationmadare used. Initialized toFALSE.
Internals
Imitates the scale() function for robust = FALSE and alternatively subtracts the
median and divides by mad for robust = TRUE.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pos = po("scale")
pos$train(list(task))[[1]]$data()
one_line_of_iris = task$filter(13)
one_line_of_iris$data()
pos$predict(list(one_line_of_iris))[[1]]$data()
Scale Numeric Features with Respect to their Maximum Absolute Value
Description
Scales the numeric data columns so their maximum absolute value is maxabs,
if possible. NA, Inf are ignored, and features that are constant 0
are not scaled.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpScaleMaxAbs$new(id = "scalemaxabs", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"scalemaxabs". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with scaled numeric features.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the maximum absolute values of each numeric feature.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
maxabs::numeric(1)
The maximum absolute value for each column after transformation. Default is 1.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("scalemaxabs")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Linearly Transform Numeric Features to Match Given Boundaries
Description
Linearly transforms numeric data columns so they are between lower
and upper. The formula for this is x' = offset + x * scale,
where scale is (upper - lower) / (max(x) - min(x)) and
offset is -min(x) * scale + lower. The same transformation is applied during training and
prediction.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpScaleRange$new(id = "scalerange", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"scalerange". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with scaled numeric features.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the two transformation parameters scale and offset for each numeric
feature.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
lower::numeric(1)
Target value of smallest item of input data. Initialized to 0. -
upper::numeric(1)
Target value of greatest item of input data. Initialized to 1.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("scalerange", param_vals = list(lower = -1, upper = 1))
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Remove Features Depending on a Selector
Description
Removes features from Task depending on a Selector function:
The selector parameter gives the features to keep.
See Selector for selectors that are provided and how to write custom Selectors.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpSelect$new(id = "select", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"select". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features removed that were not selected by the Selector/function in selector.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
selection::character
A vector of all feature names that are kept (i.e. not dropped) in theTask. Initialized toselector_all()
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
selector::function|Selector
Selectorfunction, takes aTaskas argument and returns acharacterof features to keep.
SeeSelectorfor example functions. Defaults toselector_all().
Internals
Uses task$select().
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Selectors:
Selector
Examples
library("mlr3")
task = tsk("boston_housing")
pos = po("select")
pos$param_set$values$selector = selector_all()
pos$train(list(task))[[1]]$feature_names
pos$param_set$values$selector = selector_type("factor")
pos$train(list(task))[[1]]$feature_names
pos$param_set$values$selector = selector_invert(selector_type("factor"))
pos$train(list(task))[[1]]$feature_names
pos$param_set$values$selector = selector_grep("^r")
pos$train(list(task))[[1]]$feature_names
SMOTE Balancing
Description
Generates a more balanced data set by creating synthetic instances of the minority class using the SMOTE algorithm.
The algorithm samples for each minority instance a new data point based on the K nearest neighbors of that data point.
It can only be applied to tasks with purely numeric features. See smotefamily::SMOTE for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpSmote$new(id = "smote", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"smote". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
K::numeric(1)
The number of nearest neighbors used for sampling new values. SeeSMOTE(). -
dup_size::numeric
Desired times of synthetic minority instances over the original number of majority instances. SeeSMOTE().
Internals
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = smotefamily::sample_generator(1000, ratio = 0.80)
data$result = factor(data$result)
task = TaskClassif$new(id = "example", backend = data, target = "result")
task$data()
table(task$data()$result)
# Generate synthetic data for minority class
pop = po("smote")
smotedata = pop$train(list(task))[[1]]$data()
table(smotedata$result)
SMOTENC Balancing
Description
Generates a more balanced data set by creating synthetic instances of the minority class for nominal and continuous data using the SMOTENC algorithm.
The algorithm generates for each minority instance a new data point based on the k nearest
neighbors of that data point.
It treats integer features as numeric. To not change feature types, the numeric, synthetic data
generated for these features are rounded back to integer.
Because of this, data generated through usage of this PipeOp is not exactly equal to data generated by
calling themis::smotenc directly on the same data set.
It can only be applied to classification tasks with factor (or ordered) features and at least one numeric (or integer) feature that have no missing values.
See themis::smotenc for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpSmoteNC$new(id = "smotenc", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"smotenc". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
k::integer(1)
Number of nearest neighbors used for generating new values from the minority class. Default is5. -
over_ratio::numeric(1)
Ratio of the majority to minority class. Default is1. For details, seethemis::smotenc.
Internals
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = data.frame(
target = factor(sample(c("c1", "c2"), size = 200, replace = TRUE, prob = c(0.1, 0.9))),
feature = rnorm(200)
)
task = TaskClassif$new(id = "example", backend = data, target = "target")
task$head()
table(task$data(cols = "target"))
# Generate synthetic data for minority class
pop = po("smotenc")
smotenc_result = pop$train(list(task))[[1]]$data()
nrow(smotenc_result)
table(smotenc_result$target)
Normalize Data Row-wise
Description
Normalizes the data row-wise. This is a natural generalization of the "sign" function to higher dimensions.
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpSpatialSign$new(id = "spatialsign", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"spatialsign". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their normalized versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
length::numeric(1)
Length to scale rows to. Default is 1. -
norm::numeric(1)
Norm to use. Rows are scaled tosum(x^norm)^(1/norm) == lengthfor finitenorm, or tomax(abs(x)) == lengthifnormisInf. Default is 2.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
task$data()
pop = po("spatialsign")
pop$train(list(task))[[1]]$data()
Subsampling
Description
Subsamples a Task to use a fraction of the rows.
Sampling happens only during training phase. Subsampling a Task may be
beneficial for training time at possibly (depending on original Task size)
negligible cost of predictive performance.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpSubsample$new(id = "subsample", param_vals = list())
-
id::character(1)Identifier of the resulting object, default"subsample" -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output during training is the input Task with added or removed rows according to the sampling.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
-
frac::numeric(1)
Fraction of rows in theTaskto keep. May only be greater than 1 ifreplaceisTRUE. Initialized to(1 - exp(-1)) == 0.6321. -
stratify::logical(1)
Should the subsamples be stratified by target? Initialized toFALSE. May only beTRUEforTaskClassifinput and ifuse_groups = FALSE. -
use_groups::logical(1)
IfTRUEand if theTaskhas a column with rolegroup, grouped observations are kept together during subsampling. In case of sampling with -
replace::logical(1)
Sample with replacement? Initialized toFALSE.
Internals
Uses task$filter() to remove rows. If replace is TRUE and identical rows are added, then the task$row_roles$use can not be used
to duplicate rows because of [inaudible]; instead the task$rbind() function is used, and
a new data.table is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Subsample with stratification
pop = po("subsample", frac = 0.7, stratify = TRUE, use_groups = FALSE)
pop$train(list(tsk("iris")))
# Subsample, respecting grouping
df = data.frame(
target = runif(3000),
x1 = runif(3000),
x2 = runif(3000),
grp = sample(paste0("g", 1:100), 3000, replace = TRUE)
)
task = TaskRegr$new(id = "example", backend = df, target = "target")
task$set_col_roles("grp", "group")
pop = po("subsample", frac = 0.7, use_groups = TRUE)
pop$train(list(task))
Invert Target Transformations
Description
Inverts target-transformations done during training based on a supplied inversion
function. Typically should be used in combination with a subclass of PipeOpTargetTrafo.
During prediction phase the function supplied through "fun" is called with a list containing
the "prediction" as a single element, and should return a list with a single element
(a Prediction) that is returned by PipeOpTargetInvert.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpTargetInvert$new(id = "targetinvert", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"targetinvert". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
PipeOpTargetInvert has two input channels named "fun" and "prediction". During
training, both take NULL as input. During prediction, "fun" takes a function and
"prediction" takes a Prediction.
PipeOpTargetInvert has one output channel named "output" and returns NULL during
training and a Prediction during prediction.
State
The $state is left empty (list()).
Parameters
PipeOpTargetInvert has no parameters.
Internals
Should be used in combination with a subclass of PipeOpTargetTrafo.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Transform a Target by a Function
Description
Changes the target of a Task according to a function given as hyperparameter.
An inverter-function that undoes the transformation during prediction must also be given.
Format
R6Class object inheriting from PipeOpTargetTrafo/PipeOp
Construction
PipeOpTargetMutate$new(id = "targetmutate", param_vals = list(), new_task_type = NULL)
-
id::character(1)
Identifier of resulting object, default"targetmutate". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist(). -
new_task_type::character(1)|NULL
The task type to which the output is converted, must be one ofmlr_reflections$task_types$type. Defaults toNULL: no change in task type.
Input and Output Channels
Input and output channels are inherited from PipeOpTargetTrafo.
State
The $state is left empty (list()).
Parameters
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
-
trafo::functiondata.table->data.frame|data.table|matrix
Transformation function for the target. Should only be a function of the target, i.e., taking a singledata.tableargument, typically with one column. The return value is used as the new target of the resultingTask. To change target names, change the column name of the data using e.g.setnames().
Note that this function also gets called during prediction and should thus gracefully handleNAvalues.
Initialized toidentity(). -
inverter::functiondata.table->data.table| namedlist
Inversion of the transformation function for the target. Called on adata.tablecreated from aPredictionusingas.data.table(), without the$row_idsand$truthcolumns, and should return adata.tableor namedlistthat contains the new relevant slots of aPredictionsubclass (e.g.,$response,$prob,$se, ...). Initialized toidentity().
Internals
Overloads PipeOpTargetTrafo's .transform() and
.invert() functions. Should be used in combination with PipeOpTargetInvert.
Fields
Fields inherited from PipeOp, as well as:
-
new_task_type::character(1)
new_task_typeconstruction argument. Read-only.
Methods
Only methods inherited from PipeOpTargetTrafo/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library(mlr3)
task = tsk("boston_housing")
po = PipeOpTargetMutate$new("logtrafo", param_vals = list(
trafo = function(x) log(x, base = 2),
inverter = function(x) list(response = 2 ^ x$response))
)
# Note that this example is ill-equipped to work with
# `predict_type == "se"` predictions.
po$train(list(task))
po$predict(list(task))
g = Graph$new()
g$add_pipeop(po)
g$add_pipeop(LearnerRegrRpart$new())
g$add_pipeop(PipeOpTargetInvert$new())
g$add_edge(src_id = "logtrafo", dst_id = "targetinvert",
src_channel = 1, dst_channel = 1)
g$add_edge(src_id = "logtrafo", dst_id = "regr.rpart",
src_channel = 2, dst_channel = 1)
g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert",
src_channel = 1, dst_channel = 2)
g$train(task)
g$predict(task)
#syntactic sugar using ppl():
tt = ppl("targettrafo", graph = PipeOpLearner$new(LearnerRegrRpart$new()))
tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2)
tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)
Linearly Transform a Numeric Target to Match Given Boundaries
Description
Linearly transforms a numeric target of a TaskRegr so it is between lower
and upper. The formula for this is x' = offset + x * scale,
where scale is (upper - lower) / (max(x) - min(x)) and
offset is -min(x) * scale + lower. The same transformation is applied during training and
prediction.
Format
R6Class object inheriting from PipeOpTargetTrafo/PipeOp
Construction
PipeOpTargetTrafoScaleRange$new(id = "targettrafoscalerange", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"targettrafoscalerange". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTargetTrafo.
State
The $state is a named list containing the slots $offset and $scale.
Parameters
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
-
lower::numeric(1)
Target value of smallest item of input target. Initialized to 0. -
upper::numeric(1)
Target value of greatest item of input target. Initialized to 1.
Internals
Overloads PipeOpTargetTrafo's .get_state(), .transform(), and
.invert(). Should be used in combination with PipeOpTargetInvert.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTargetTrafo/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library(mlr3)
task = tsk("boston_housing")
po = PipeOpTargetTrafoScaleRange$new()
po$train(list(task))
po$predict(list(task))
#syntactic sugar for a graph using ppl():
ttscalerange = ppl("targettrafo", trafo_pipeop = PipeOpTargetTrafoScaleRange$new(),
graph = PipeOpLearner$new(LearnerRegrRpart$new()))
ttscalerange$train(task)
ttscalerange$predict(task)
ttscalerange$state$regr.rpart
Bag-of-word Representation of Character Features
Description
Computes a bag-of-word representation from a (set of) columns.
Columns of type character are split up into words.
Uses the quanteda::dfm() and quanteda::dfm_trim() functions.
TF-IDF computation works similarly to quanteda::dfm_tfidf()
but has been adjusted for train/test data split using quanteda::docfreq()
and quanteda::dfm_weight().
In short:
Per default, produces a bag-of-words representation
If
nis set to values > 1, ngrams are computedIf
df_trimparameters are set, the bag-of-words is trimmed.The
scheme_tfparameter controls term-frequency (per-document, i.e. per-row) weightingThe
scheme_dfparameter controls the document-frequency (per token, i.e. per-column) weighting.
Parameters specify arguments to quanteda's dfm, dfm_trim, docfreq and dfm_weight.
What belongs to what can be obtained from each parameter's tags where tokenizer are
arguments passed on to quanteda::dfm().
Defaults to a bag-of-words representation with token counts as matrix entries.
In order to perform the default dfm_tfidf weighting, set the scheme_df parameter to "inverse".
The scheme_df parameter is initialized to "unary", which disables document frequency weighting.
The PipeOp works as follows:
Words are tokenized using
quanteda::tokens.Ngrams are computed using
quanteda::tokens_ngrams.A document-frequency matrix is computed using
quanteda::dfm.The document-frequency matrix is trimmed using
quanteda::dfm_trimduring train-time.The document-frequency matrix is re-weighted (similar to
quanteda::dfm_tfidf) ifscheme_dfis not set to"unary".
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpTextVectorizer$new(id = "textvectorizer", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"textvectorizer". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected features converted to a bag-of-words
representation.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
colmodels:: namedlist
Named list with one entry per extracted column. Each entry has two further elements:-
tdm: sparse document-feature matrix resulting fromquanteda::dfm() -
docfreq: (weighted) document frequency resulting fromquanteda::docfreq()
-
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
return_type::character(1)
Whether to return an integer representation ("integer-sequence") or a Bag-of-words ("bow"). If set to"integer_sequence", tokens are replaced by an integer and padded/truncated tosequence_length. If set to"factor_sequence", tokens are replaced by a factor and padded/truncated tosequence_length. If set to"bow", a possibly weighted bag-of-words matrix is returned. Defaults tobow. -
stopwords_language::character(1)
Language to use for stopword filtering. Needs to be either"none", a language identifier listed instopwords::stopwords_getlanguages("snowball")("de","en", ...) or"smart"."none"disables language-specific stopwords."smart"coresponds tostopwords::stopwords(source = "smart"), which contains English stopwords and also removes one-character strings. Initialized to"smart". -
extra_stopwords::character
Extra stopwords to remove. Must be acharactervector containing individual tokens to remove. Whennis set to values greater than1, this can also contain stop-ngrams. Initialized tocharacter(0). -
tolower::logical(1)
Whether to convert to lower case. Seequanteda::dfm. Default isTRUE. -
stem::logical(1)
Whether to perform stemming. Seequanteda::dfm. Default isFALSE. -
what::character(1)
Tokenization splitter. Seequanteda::tokens. Default is"word". -
remove_punct::logical(1)
Seequanteda::tokens. Default isFALSE. -
remove_url::logical(1)
Seequanteda::tokens. Default isFALSE. -
remove_symbols::logical(1)
Seequanteda::tokens. Default isFALSE. -
remove_numbers::logical(1)
Seequanteda::tokens. Default isFALSE. -
remove_separators::logical(1)
Seequanteda::tokens. Default isTRUE. -
split_hypens::logical(1)
Seequanteda::tokens. Default isFALSE. -
n::integer
Vector of ngram lengths. Seequanteda::tokens_ngrams. Initialized to1, deviating from the base function's default. Note that this can be a vector of multiple values, to construct ngrams of multiple orders. -
skip::integer
Vector of skips. Seequanteda::tokens_ngrams. Default is0. Note that this can be a vector of multiple values. -
sparsity::numeric(1)
Desired sparsity of the 'tfm' matrix. Seequanteda::dfm_trim. Default isNULL. -
max_termfreq::numeric(1)
Maximum term frequency in the 'tfm' matrix. Seequanteda::dfm_trim. Default isNULL. -
min_termfreq::numeric(1)
Minimum term frequency in the 'tfm' matrix. Seequanteda::dfm_trim. Default isNULL. -
termfreq_type::character(1)
How to asess term frequency. Seequanteda::dfm_trim. Default is"count". -
scheme_df::character(1)
Weighting scheme for document frequency: Seequanteda::docfreq. Initialized to"unary"(1for each document, deviating from base function default). -
smoothing_df::numeric(1)
Seequanteda::docfreq. Default is0. -
k_df::numeric(1)
kparameter given toquanteda::docfreq(see there). Default is0. -
threshold_df::numeric(1)
Seequanteda::docfreq. Default is0. Only considered ifscheme_dfis set to"count". -
base_df::numeric(1)
The base for logarithms inquanteda::docfreq(see there). Default is10. -
scheme_tf::character(1)
Weighting scheme for term frequency: Seequanteda::dfm_weight. Default is"count". -
k_tf::numeric(1)
kparameter given toquanteda::dfm_weight(see there). Default is0.5. -
base_df::numeric(1)
The base for logarithms inquanteda::dfm_weight(see there). Default is10. -
sequence_length::integer(1)
The length of the integer sequence. Defaults toInf, i.e. all texts are padded to the length of the longest text. Only relevant forreturn_typeis set to"integer_sequence".
Internals
See Description. Internally uses the quanteda package. Calls quanteda::tokens, quanteda::tokens_ngrams and quanteda::dfm. During training,
quanteda::dfm_trim is also called. Tokens not seen during training are dropped during prediction.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
library("data.table")
# create some text data
dt = data.table(
txt = replicate(150, paste0(sample(letters, 3), collapse = " "))
)
task = tsk("iris")$cbind(dt)
pos = po("textvectorizer", param_vals = list(stopwords_language = "en"))
pos$train(list(task))[[1]]$data()
one_line_of_iris = task$filter(13)
one_line_of_iris$data()
pos$predict(list(one_line_of_iris))[[1]]$data()
Change the Threshold of a Classification Prediction
Description
Change the threshold of a Prediction during the predict step.
The incoming Learner's $predict_type needs to be "prob".
Internally calls PredictionClassif$set_threshold.
Format
R6Class inheriting from PipeOp.
Construction
PipeOpThreshold$new(id = "threshold", param_vals = list())
-
id::character(1)Identifier of the resulting object, default"threshold". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaults tonumeric(0).
Input and Output Channels
During training, the input and output are NULL.
A PredictionClassif is required as input and returned as output during prediction.
State
The $state is left empty (list()).
Parameters
-
thresholds::numeric
A numeric vector of thresholds for the different class levels. May have length 1 for binary classification predictions, must otherwise have length of the number of target classes; seePredictionClassif's$set_threshold()method. Initialized to0.5, i.e. thresholding for binary classification at level0.5.
Fields
Fields inherited from PipeOp, as well as:
-
predict_type::character(1)
Type of prediction to return. Either"prob"(default) or"response". Setting to"response"should rarely be used; it may potentially save some memory but has no other benefits.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
t = tsk("german_credit")
gr = po(lrn("classif.rpart", predict_type = "prob")) %>>%
po("threshold", param_vals = list(thresholds = 0.9))
gr$train(t)
gr$predict(t)
Tomek Down-Sampling
Description
Generates a cleaner data set by removing all majority-minority Tomek links.
The algorithm down-samples the data by removing all pairs of observations that form a Tomek link, i.e. a pair of observations that are nearest neighbors and belong to different classes. For this only numeric and integer features are taken into account. These must have no missing values.
This can only be applied to classification tasks. Multiclass classification is supported.
See themis::tomek for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpTomek$new(id = "tomek", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"tomek". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with removed rows for pairs of observations that form a Tomek link.
The output during prediction is the unchanged input.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
References
Tomek I (1976). “Two Modifications of CNN.” IEEE Transactions on Systems, Man and Cybernetics, 6(11), 769–772. doi:10.1109/TSMC.1976.4309452.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
task = tsk("iris")
task$head()
table(task$data(cols = "Species"))
# Down-sample data
pop = po("tomek")
tomek_result = pop$train(list(task))[[1]]$data()
nrow(tomek_result)
table(tomek_result$Species)
Tune the Threshold of a Classification Prediction
Description
Tunes optimal probability thresholds over different PredictionClassifs.
mlr3::Learner predict_type: "prob" is required.
Thresholds for each learner are optimized using the Optimizer supplied via
the param_set.
Defaults to GenSA.
Returns a single PredictionClassif.
This PipeOp should be used in conjunction with PipeOpLearnerCV in order to
optimize thresholds of cross-validated predictions.
In order to optimize thresholds without cross-validation, use PipeOpLearnerCV
in conjunction with ResamplingInsample.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpTuneThreshold$new(id = "tunethreshold", param_vals = list())
-
id::character(1)
Identifier of resulting object. Default: "tunethreshold". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOp.
State
The $state is a named list with elements
-
thresholds::numeric
Learned thresholds;
Parameters
The parameters are the parameters inherited from PipeOp, as well as:
-
measure::Measure|character
Measureto optimize for. Will be converted to aMeasurein case it ischaracter. Initialized to"classif.ce", i.e. misclassification error. -
optimizer::Optimizer|character(1)
Optimizerused to find optimal thresholds. Ifcharacter, converts toOptimizerviaopt. Initialized toOptimizerGenSA. -
log_level::character(1)|integer(1)
Set a temporary log-level forlgr::get_logger("mlr3/bbotk"). Initialized to: "warn".
Internals
Uses the optimizer provided as a param_val in order to find an optimal threshold.
See the optimizer parameter for more info.
Fields
Fields inherited from PipeOp, as well as:
-
predict_type::character(1)
Type of prediction to return. Either"prob"(default) or"response". Setting to"response"should rarely be used; it may potentially save some memory but has no other benefits.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pop = po("learner_cv", lrn("classif.rpart", predict_type = "prob")) %>>%
po("tunethreshold")
task$data()
pop$train(task)
pop$state
Unbranch Different Paths
Description
Used to bring together different paths created by PipeOpBranch.
Format
R6Class object inheriting from PipeOp.
Construction
PipeOpUnbranch$new(options, id = "unbranch", param_vals = list())
-
options::numeric(1)|character
Ifoptionsis 0, a vararg input channel is created that can take any number of inputs. Ifoptionsis a nonzero integer number, it determines the number of input channels / options that are created, namedinput1...input<n>. The Ifoptionsis acharacter, it determines the names of channels directly. The difference between these three is purely cosmetic if the user chooses to produce channel names matching with the correspondingPipeOpBranch. However, it is not necessary to have matching names and the vararg option is always viable. -
id::character(1)
Identifier of resulting object, default"unbranch". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output
PipeOpUnbranch has multiple input channels depending on the options construction argument, named "input1", "input2", ...
if options is a nonzero integer and named after each options value if options is a character; if options is 0, there is only one
vararg input channel named "...".
All input channels take any argument ("*") both during training and prediction.
PipeOpUnbranch has one output channel named "output", producing the only NO_OP object received as input ("*"),
both during training and prediction.
State
The $state is left empty (list()).
Parameters
PipeOpUnbranch has no parameters.
Internals
See PipeOpBranch Internals on how alternative path branching works.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Path Branching:
NO_OP,
filter_noop(),
is_noop(),
mlr_pipeops_branch
Examples
# See PipeOpBranch for a complete branching example
pou = po("unbranch")
pou$train(list(NO_OP, NO_OP, "hello", NO_OP, NO_OP))
Transform a Target without an Explicit Inversion
Description
EXPERIMENTAL, API SUBJECT TO CHANGE
Handles target transformation operations that do not need explicit inversion.
In case the new target is required during predict, creates a vector of NA.
Works similar to PipeOpTargetTrafo and PipeOpTargetMutate, but forgoes the
inversion step.
In case target after the trafo is a factor, levels are saved to $state.
During prediction: Sets all target values to NA before calling the trafo again.
In case target after the trafo is a factor, levels saved in the state are
set during prediction.
As a special case when trafo is identity and new_target_name matches an existing column
name of the data of the input Task, this column is set as the new target. Depending on
drop_original_target the original target is then either dropped or added to the features.
Format
Abstract R6Class inheriting from PipeOp.
Construction
PipeOpUpdateTarget$new(id, param_set = ps(), param_vals = list(), packages = character(0))
-
id::character(1)
Identifier of resulting object. See$idslot ofPipeOp. -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist().
Parameters
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
-
trafo::function
Transformation function for the target. Should only be a function of the target, i.e., taking a single argument. Default isidentity. Note, that the data passed on to the target is adata.tableconsisting of all target column. -
new_target_name::character(1)
Optionally give the transformed target a new name. By default the original name is used. -
new_task_type::character(1)
Optionally a new task type can be set. Legal types are listed inmlr_reflections$task_types$type. #'drop_original_target::logical(1)
Whether to drop the original target column. Default:TRUE.
State
The $state is a list of class levels for each target after trafo.
list() if none of the targets have levels.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
## Not run:
# Create a binary class task from iris
library(mlr3)
trafo_fun = function(x) {factor(ifelse(x$Species == "setosa", "setosa", "other"))}
po = PipeOpUpdateTarget$new(param_vals = list(trafo = trafo_fun, new_target_name = "setosa"))
po$train(list(tsk("iris")))
po$predict(list(tsk("iris")))
## End(Not run)
Interface to the vtreat Package
Description
Provides an interface to the vtreat package.
PipeOpVtreat naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(), or vtreat::MultinomialOutcomeTreatment(), followed by calling
vtreat::fit_prepare() on the training data and vtreat::prepare() during predicton.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpVtreat$new(id = "vtreat", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"vtreat". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task is returned unaltered.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
treatment_plan:: object of classvtreat_pipe_step|NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of classtreatment_plan. If vtreat found "no usable vars" and designing the treatment would have failed, this isNULL.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
recommended::logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant variables with a significance value smaller than vtreat's threshold. Initialized toTRUE. -
cols_to_copy::function|Selector
Selectorfunction, takes aTaskas argument and returns acharacter()of features to copy.
SeeSelectorfor example functions. Initialized toselector_none(). -
minFraction::numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column. -
smFactor::numeric(1)
Smoothing factor for impact coding models. -
rareCount::integer(1)
Allow levels with this count or below to be pooled into a shared rare-level. -
rareSig::numeric(1)
Suppress levels from pooling at this significance value greater. -
collarProb::numeric(1)
What fraction of the data (pseudo-probability) to collar data at ifdoCollar = TRUE. -
doCollar::logical(1)
IfTRUEcollar numeric variables by cutting off after a tail-probability specified bycollarProbduring treatment design. -
codeRestriction::character()
What types of variables to produce. -
customCoders:: namedlist
Map from code names to custom categorical variable encoding functions. -
splitFunction::function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split. -
ncross::integer(1)
Integer larger than one, number of cross-validation rounds to design. -
forceSplit::logical(1)
IfTRUEforce cross-validated significance calculations on all variables. -
catScaling::logical(1)
IfTRUEusestats::glm()linkspace, if FALSE usestats::lm()for scaling. -
verbose::logical(1)
IfTRUEprint progress. -
use_parallel::logical(1)
IfTRUEuse parallel methods. -
missingness_imputation::function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via aPipeOpshould be preferred, seePipeOpImpute. -
pruneSig::numeric(1)
Suppress variables with significance above this level. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
scale::logical(1)
IfTRUEreplace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome. -
varRestriction::list()
List of treated variable names to restrict to. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
trackedValues:: namedlist()
Named list mapping variables to know values, allows warnings upon novel level appearances (seevtreat::track_values()). Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
y_dependent_treatments::character()
Character what treatment types to build per-outcome level. Only effects multiclass classification tasks. -
imputation_map:: namedlist
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via aPipeOpis to be preferred, seePipeOpImpute.
For more information, see vtreat::regression_parameters(), vtreat::classification_parameters(), or vtreat::multinomial_parameters().
Internals
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(),
vtreat::MultinomialOutcomeTreatment(), vtreat::fit_prepare() and vtreat::prepare().
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))
Yeo-Johnson Transformation of Numeric Features
Description
Conducts a Yeo-Johnson transformation on numeric features. It therefore estimates
the optimal value of lambda for the transformation.
See bestNormalize::yeojohnson() for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpYeoJohnson$new(id = "yeojohnson", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"yeojohnson". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their transformed versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as a list of class yeojohnson for each column, which is transformed.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
eps::numeric(1)
Tolerance parameter to identify the lambda parameter as zero. For details seeyeojohnson(). -
standardize::logical
Whether to center and scale the transformed values to attempt a standard normal distribution. For details seeyeojohnson(). -
lower::numeric(1)
Lower value for estimation of lambda parameter. For details seeyeojohnson(). -
upper::numeric(1)
Upper value for estimation of lambda parameter. For details seeyeojohnson().
Internals
Uses the bestNormalize::yeojohnson function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat
Examples
library("mlr3")
task = tsk("iris")
pop = po("yeojohnson")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
Housing Data for 506 Census Tracts of Boston
Description
Housing Data for 506 Census Tracts of Boston
Format
R6Class object inheriting from TaskRegr.
The BostonHousing2 dataset
containing the corrected data from III AMF (1979).
“The Hedonic Price Approach to Measuring Demand for Neighborhood Characteristics.”
In The Economics of Neighborhood, 191–217.
Elsevier.
doi:10.1016/B978-0-12-636250-3.50015-5.
as provided by the mlbench package. See data description there.
Shorthand PipeOp Constructor
Description
Create
a
PipeOpfrommlr_pipeopsfrom given IDa
PipeOpLearnerfrom aLearnerobjecta
PipeOpFilterfrom aFilterobjecta
PipeOpSelectfrom aSelectorobjecta clone of a
PipeOpfrom a givenPipeOp(possibly with changed settings)
The object is initialized with given parameters and param_vals.
po() taks a single obj (PipeOp id, Learner, ...) and converts
it to a PipeOp. pos() (with plural-s) takes either a character-vector, or a
list of objects, and creates a list of PipeOps.
Usage
po(.obj, ...)
pos(.objs, ...)
Arguments
.obj |
|
... |
|
.objs |
|
Value
A PipeOp (for po()), or a list of PipeOps (for pos()).
Examples
library("mlr3")
po("learner", lrn("classif.rpart"), cp = 0.3)
po(lrn("classif.rpart"), cp = 0.3)
# is equivalent with:
mlr_pipeops$get("learner", lrn("classif.rpart"),
param_vals = list(cp = 0.3))
mlr3pipelines::pos(c("pca", original = "nop"))
Shorthand Graph Constructor
Description
Creates a Graph from mlr_graphs from given ID
ppl() taks a character(1) and returns a Graph. ppls() takes a character
vector of any list and returns a list of possibly muliple Graphs.
Usage
ppl(.key, ...)
ppls(.keys, ...)
Arguments
.key |
|
... |
|
.keys |
|
Value
Graph (for ppl()) or list of Graphs (for ppls()).
Examples
library("mlr3")
gr = ppl("bagging", graph = po(lrn("regr.rpart")),
averager = po("regravg", collect_multiplicity = TRUE))
Simple Pre-processing
Description
Function that offers a simple and direct way to train or predict PipeOps and Graphs on Tasks,
data.frames or data.tables.
Training happens if predict is set to FALSE and no state is passed to this function.
Prediction happens if predict is set to TRUE and if the passed Graph or PipeOp is either trained or a state
is explicitly passed to this function.
The passed PipeOp or Graph gets modified by-reference.
Usage
preproc(indata, processor, state = NULL, predict = !is.null(state))
Arguments
indata |
( |
processor |
( |
state |
(named |
predict |
( |
Value
any | data.frame | data.table:
If indata is a Task, whatever is returned by the processor's single output channel is returned.
If indata is a data.frame or data.table, an object of the same class is returned, or
if the processor's output channel does not return a Task, an error is thrown.
Internals
If processor is a PipeOp, the S3 method preproc.PipeOp gets called first, converting the PipeOp into a
Graph and wrapping the state appropriately, before calling the S3 method preproc.Graph with the modified objects.
If indata is a data.frame or data.table, a
TaskUnsupervised is constructed internally. This implies that processors which only work on sub-classes
of TaskSupervised will not work with these input types for indata.
Examples
library("mlr3")
task = tsk("iris")
pop = po("pca")
# Training
preproc(task, pop)
# Note that the PipeOp gets trained through this
pop$is_trained
# Predicting a trained PipeOp (trained through previous call to preproc)
preproc(task, pop, predict = TRUE)
# Predicting using a given state
# We use the state of the PipeOp from the last example and then reset it
state = pop$state
pop$state = NULL
preproc(task, pop, state)
# Note that the PipeOp's state may get overwritten inadvertently during
# training or if a state is given
pop$state$sdev
preproc(tsk("wine"), pop)
pop$state$sdev
# Piping multiple preproc() calls, using dictionary sugar to set parameters
# tsk("penguins") |>
# preproc(po("imputemode", affect_columns = selector_name("sex"))) |>
# preproc(po("imputemean"))
# Use preproc with a Graph
gr = po("pca", rank. = 4) %>>% po("learner", learner = lrn("classif.rpart"))
preproc(tsk("sonar"), gr) # returns NULL because of the learner
preproc(tsk("sonar"), gr, predict = TRUE)
# Training with a data.table input
# Note that `$data()` drops the information that "Species" is the target.
# It gets handled like an ordinary feature here.
dt = tsk("iris")$data()
preproc(dt, pop)
# Predicting with a data.table input
preproc(dt, pop)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- data.table
Add Autoconvert Function to Conversion Register
Description
Add functions that perform conversion to a desired class.
Whenever a Graph or a PipeOp is called with an object
that does not conform to its declared input type, the "autoconvert
register" is queried for functions that may turn the object into
a desired type.
Conversion functions should try to avoid cloning.
Usage
register_autoconvert_function(cls, fun, packages = character(0))
Arguments
cls |
|
fun |
|
packages |
|
Value
NULL.
See Also
Other class hierarchy operations:
add_class_hierarchy_cache(),
reset_autoconvert_register(),
reset_class_hierarchy_cache()
Examples
# This lets mlr3pipelines automatically try to convert a string into
# a `PipeOp` by querying the [`mlr_pipeops`] [`Dictionary`][mlr3misc::Dictionary].
# This is an example and not necessary, because mlr3pipelines adds it by default.
register_autoconvert_function("PipeOp", function(x) as_pipeop(x), packages = "mlr3pipelines")
Reset Autoconvert Register
Description
Reset autoconvert register to factory default, thereby undoing
any calls to register_autoconvert_function() by the user.
Usage
reset_autoconvert_register()
Value
NULL
See Also
Other class hierarchy operations:
add_class_hierarchy_cache(),
register_autoconvert_function(),
reset_class_hierarchy_cache()
Reset the Class Hierarchy Cache
Description
Reset the class hierarchy cache to factory default, thereby undoing
any calls to add_class_hierarchy_cache() by the user.
Usage
reset_class_hierarchy_cache()
Value
NULL
See Also
Other class hierarchy operations:
add_class_hierarchy_cache(),
register_autoconvert_function(),
reset_autoconvert_register()
Configure Validation for a GraphLearner
Description
Configure validation for a graph learner.
In a GraphLearner, validation can be configured on two levels:
On the
GraphLearnerlevel, which specifies how the validation set is constructed before entering the graph.On the level of the individual
PipeOps (such asPipeOpLearner), which specifies which pipeops actually make use of the validation data (set its$validatefield to"predefined") or not (set it toNULL). This can be specified via the argumentids.
Usage
## S3 method for class 'GraphLearner'
set_validate(
learner,
validate,
ids = NULL,
args_all = list(),
args = list(),
...
)
Arguments
learner |
( |
validate |
( |
ids |
( |
args_all |
( |
args |
(named |
... |
(any) |
Examples
library(mlr3)
glrn = as_learner(po("pca") %>>% lrn("classif.debug"))
set_validate(glrn, 0.3)
glrn$validate
glrn$graph$pipeops$classif.debug$learner$validate
set_validate(glrn, NULL)
glrn$validate
glrn$graph$pipeops$classif.debug$learner$validate
set_validate(glrn, 0.2, ids = "classif.debug")
glrn$validate
glrn$graph$pipeops$classif.debug$learner$validate