Learners

The following classes provide a unified interface to all popular machine learning methods in R: (cost-sensitive) classification, regression, survival analysis, and clustering. Many are already integrated, and mlr is specifically designed to make extensions simple.

See the integrated learners page to reference already implemented machine learning methods and their properties. If your favorite method is missing, either open an issue or integrate a learning method yourself.

This basic introduction demonstrates how to use already implemented learners.

Constructing a learner

A learner in mlr is generated by calling makeLearner. In the constructor, you need to specify which learning method you want to use. You may also:

## Classification tree, set it up for predicting probabilities
classif.lrn = makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = TRUE)

## Regression gradient boosting machine, specify hyperparameters via a list
regr.lrn = makeLearner("regr.gbm", par.vals = list(n.trees = 500, interaction.depth = 3))

## Cox proportional hazards model with custom name
surv.lrn = makeLearner("surv.coxph", id = "cph")

## K-means with 5 clusters
cluster.lrn = makeLearner("cluster.kmeans", centers = 5)

## Multilabel Random Ferns classification algorithm
multilabel.lrn = makeLearner("multilabel.rFerns")

The first argument specifies which algorithm to use. The naming convention is classif.<R_method_name> for classification methods, regr.<R_method_name> for regression methods, surv.<R_method_name> for survival analysis, cluster.<R_method_name> for clustering methods, and multilabel.<R_method_name> for multilabel classification.

Hyperparameter values can be specified either via the ... argument or as a list via par.vals.

Occasionally, factor features may cause problems when fewer levels are present in the test data set than in the training data. We can avoid this by setting fix.factors.prediction = TRUE to add a factor level for missing data in the test data set.

Let's have a look at two of the learners created above:

classif.lrn
#> Learner classif.randomForest from package randomForest
#> Type: classif
#> Name: Random Forest; Short name: rf
#> Class: classif.randomForest
#> Properties: twoclass,multiclass,numerics,factors,ordered,prob,class.weights,oobpreds,featimp
#> Predict-Type: prob
#> Hyperparameters:

surv.lrn
#> Learner cph from package survival
#> Type: surv
#> Name: Cox Proportional Hazard Model; Short name: coxph
#> Class: surv.coxph
#> Properties: numerics,factors,weights
#> Predict-Type: response
#> Hyperparameters:

All generated learners are objects of class Learner. This class contains the properties of the method, e.g., which types of features it can handle, what kind of output is possible during prediction, and whether multi-class problems, observations weights or missing values are supported.

There is currently no special learner class for cost-sensitive classification. For ordinary misclassification costs, you can use standard classification methods. For example-dependent costs, there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. This is explained in greater detail in the cost-sensitive classification tutorial.

Accessing a learner

The Learner object is a list and the following elements contain information regarding the hyperparameters and the type of prediction.

## Get the configured hyperparameter settings that deviate from the defaults
cluster.lrn$par.vals
#> $centers
#> [1] 5

## Get the set of hyperparameters
classif.lrn$par.set
#>                      Type  len   Def   Constr Req Tunable Trafo
#> ntree             integer    -   500 1 to Inf   -    TRUE     -
#> mtry              integer    -     - 1 to Inf   -    TRUE     -
#> replace           logical    -  TRUE        -   -    TRUE     -
#> classwt     numericvector <NA>     - 0 to Inf   -    TRUE     -
#> cutoff      numericvector <NA>     -   0 to 1   -    TRUE     -
#> strata            untyped    -     -        -   -   FALSE     -
#> sampsize    integervector <NA>     - 1 to Inf   -    TRUE     -
#> nodesize          integer    -     1 1 to Inf   -    TRUE     -
#> maxnodes          integer    -     - 1 to Inf   -    TRUE     -
#> importance        logical    - FALSE        -   -    TRUE     -
#> localImp          logical    - FALSE        -   -    TRUE     -
#> proximity         logical    - FALSE        -   -   FALSE     -
#> oob.prox          logical    -     -        -   Y   FALSE     -
#> norm.votes        logical    -  TRUE        -   -   FALSE     -
#> do.trace          logical    - FALSE        -   -   FALSE     -
#> keep.forest       logical    -  TRUE        -   -   FALSE     -
#> keep.inbag        logical    - FALSE        -   -   FALSE     -

## Get the type of prediction
regr.lrn$predict.type
#> [1] "response"

Slot $par.set is an object of class ParamSet containing the type of hyperparameters (e.g., numeric, logical), potential default values and the range of allowed values.

mlr provides function getHyperPars or its alternative getLearnerParVals to access the current hyperparameter setting of a Learner and getParamSet to get a description of all possible settings.

These are particularly useful with wrapped Learners, such as a learner fused with a feature selection strategy, where both the learner and feature selection strategy have hyperparameters. For details see the wrapped learners tutorial.

## Get current hyperparameter settings
getHyperPars(cluster.lrn)
#> $centers
#> [1] 5

## Get a description of all possible hyperparameter settings
getParamSet(classif.lrn)
#>                      Type  len   Def   Constr Req Tunable Trafo
#> ntree             integer    -   500 1 to Inf   -    TRUE     -
#> mtry              integer    -     - 1 to Inf   -    TRUE     -
#> replace           logical    -  TRUE        -   -    TRUE     -
#> classwt     numericvector <NA>     - 0 to Inf   -    TRUE     -
#> cutoff      numericvector <NA>     -   0 to 1   -    TRUE     -
#> strata            untyped    -     -        -   -   FALSE     -
#> sampsize    integervector <NA>     - 1 to Inf   -    TRUE     -
#> nodesize          integer    -     1 1 to Inf   -    TRUE     -
#> maxnodes          integer    -     - 1 to Inf   -    TRUE     -
#> importance        logical    - FALSE        -   -    TRUE     -
#> localImp          logical    - FALSE        -   -    TRUE     -
#> proximity         logical    - FALSE        -   -   FALSE     -
#> oob.prox          logical    -     -        -   Y   FALSE     -
#> norm.votes        logical    -  TRUE        -   -   FALSE     -
#> do.trace          logical    - FALSE        -   -   FALSE     -
#> keep.forest       logical    -  TRUE        -   -   FALSE     -
#> keep.inbag        logical    - FALSE        -   -   FALSE     -

We can also use getParamSet or its alias getLearnerParamSet to get a quick overview about the available hyperparameters and defaults of a learning method without explicitly constructing it (by calling makeLearner).

getParamSet("classif.randomForest")
#>                      Type  len   Def   Constr Req Tunable Trafo
#> ntree             integer    -   500 1 to Inf   -    TRUE     -
#> mtry              integer    -     - 1 to Inf   -    TRUE     -
#> replace           logical    -  TRUE        -   -    TRUE     -
#> classwt     numericvector <NA>     - 0 to Inf   -    TRUE     -
#> cutoff      numericvector <NA>     -   0 to 1   -    TRUE     -
#> strata            untyped    -     -        -   -   FALSE     -
#> sampsize    integervector <NA>     - 1 to Inf   -    TRUE     -
#> nodesize          integer    -     1 1 to Inf   -    TRUE     -
#> maxnodes          integer    -     - 1 to Inf   -    TRUE     -
#> importance        logical    - FALSE        -   -    TRUE     -
#> localImp          logical    - FALSE        -   -    TRUE     -
#> proximity         logical    - FALSE        -   -   FALSE     -
#> oob.prox          logical    -     -        -   Y   FALSE     -
#> norm.votes        logical    -  TRUE        -   -   FALSE     -
#> do.trace          logical    - FALSE        -   -   FALSE     -
#> keep.forest       logical    -  TRUE        -   -   FALSE     -
#> keep.inbag        logical    - FALSE        -   -   FALSE     -

Functions for accessing a Learner's meta information are available in mlr. We can use getLearnerId, getLearnerShortName and getLearnerType. To show the required packages for a Learner, use getLearnerPackages.

## Get object's id
getLearnerId(surv.lrn)
#> [1] "cph"

## Get the short name
getLearnerShortName(classif.lrn)
#> [1] "rf"

## Get the type of the learner
getLearnerType(multilabel.lrn)
#> [1] "multilabel"

## Get required packages
getLearnerPackages(cluster.lrn)
#> [1] "stats" "clue"

Modifying a learner

We also provide functions that enable you to change certain aspects of a Learner without needing to create a new Learner from scratch. Here are some examples:

## Change the ID
surv.lrn = setLearnerId(surv.lrn, "CoxModel")
surv.lrn
#> Learner CoxModel from package survival
#> Type: surv
#> Name: Cox Proportional Hazard Model; Short name: coxph
#> Class: surv.coxph
#> Properties: numerics,factors,weights
#> Predict-Type: response
#> Hyperparameters:

## Change the prediction type, predict a factor with class labels instead of probabilities
classif.lrn = setPredictType(classif.lrn, "response")

## Change hyperparameter values
cluster.lrn = setHyperPars(cluster.lrn, centers = 4)

## Go back to default hyperparameter values
regr.lrn = removeHyperPars(regr.lrn, c("n.trees", "interaction.depth"))

Listing learners

See the Appendix for a list of all learners integrated in mlr along with their respective properties.

If you would like a list of available learners with certain properties or suitable for a particular learning Task, use function listLearners.

## List everything in mlr
lrns = listLearners()
head(lrns[c("class", "package")])
#>                 class      package
#> 1         classif.ada    ada,rpart
#> 2  classif.adaboostm1        RWeka
#> 3 classif.bartMachine  bartMachine
#> 4    classif.binomial        stats
#> 5  classif.blackboost mboost,party
#> 6    classif.boosting adabag,rpart

## List classifiers that can output probabilities
lrns = listLearners("classif", properties = "prob")
head(lrns[c("class", "package")])
#>                 class      package
#> 1         classif.ada    ada,rpart
#> 2  classif.adaboostm1        RWeka
#> 3 classif.bartMachine  bartMachine
#> 4    classif.binomial        stats
#> 5  classif.blackboost mboost,party
#> 6    classif.boosting adabag,rpart

## List classifiers that can be applied to iris (i.e., multiclass) and output probabilities
lrns = listLearners(iris.task, properties = "prob")
head(lrns[c("class", "package")])
#>                class      package
#> 1 classif.adaboostm1        RWeka
#> 2   classif.boosting adabag,rpart
#> 3        classif.C50          C50
#> 4    classif.cforest        party
#> 5      classif.ctree        party
#> 6   classif.cvglmnet       glmnet

## The calls above return character vectors, but you can also create learner objects
head(listLearners("cluster", create = TRUE), 2)
#> [[1]]
#> Learner cluster.cmeans from package e1071,clue
#> Type: cluster
#> Name: Fuzzy C-Means Clustering; Short name: cmeans
#> Class: cluster.cmeans
#> Properties: numerics,prob
#> Predict-Type: response
#> Hyperparameters: centers=2
#> 
#> 
#> [[2]]
#> Learner cluster.Cobweb from package RWeka
#> Type: cluster
#> Name: Cobweb Clustering Algorithm; Short name: cobweb
#> Class: cluster.Cobweb
#> Properties: numerics
#> Predict-Type: response
#> Hyperparameters:

Complete code listing

The above code without the output is given below:

## Classification tree, set it up for predicting probabilities 
classif.lrn = makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = TRUE) 

## Regression gradient boosting machine, specify hyperparameters via a list 
regr.lrn = makeLearner("regr.gbm", par.vals = list(n.trees = 500, interaction.depth = 3)) 

## Cox proportional hazards model with custom name 
surv.lrn = makeLearner("surv.coxph", id = "cph") 

## K-means with 5 clusters 
cluster.lrn = makeLearner("cluster.kmeans", centers = 5) 

## Multilabel Random Ferns classification algorithm 
multilabel.lrn = makeLearner("multilabel.rFerns") 
classif.lrn 

surv.lrn 
## Get the configured hyperparameter settings that deviate from the defaults 
cluster.lrn$par.vals 

## Get the set of hyperparameters 
classif.lrn$par.set 

## Get the type of prediction 
regr.lrn$predict.type 
## Get current hyperparameter settings 
getHyperPars(cluster.lrn) 

## Get a description of all possible hyperparameter settings 
getParamSet(classif.lrn) 
getParamSet("classif.randomForest") 
## Get object's id 
getLearnerId(surv.lrn) 

## Get the short name 
getLearnerShortName(classif.lrn) 

## Get the type of the learner 
getLearnerType(multilabel.lrn) 

## Get required packages 
getLearnerPackages(cluster.lrn) 
## Change the ID 
surv.lrn = setLearnerId(surv.lrn, "CoxModel") 
surv.lrn 

## Change the prediction type, predict a factor with class labels instead of probabilities 
classif.lrn = setPredictType(classif.lrn, "response") 

## Change hyperparameter values 
cluster.lrn = setHyperPars(cluster.lrn, centers = 4) 

## Go back to default hyperparameter values 
regr.lrn = removeHyperPars(regr.lrn, c("n.trees", "interaction.depth")) 
## List everything in mlr 
lrns = listLearners() 
head(lrns[c("class", "package")]) 

## List classifiers that can output probabilities 
lrns = listLearners("classif", properties = "prob") 
head(lrns[c("class", "package")]) 

## List classifiers that can be applied to iris (i.e., multiclass) and output probabilities 
lrns = listLearners(iris.task, properties = "prob") 
head(lrns[c("class", "package")]) 

## The calls above return character vectors, but you can also create learner objects 
head(listLearners("cluster", create = TRUE), 2)