Some learners like random forest use bagging. Bagging means that the learner consists of an ensemble of several base learners and each base learner is trained with a different random subsample or bootstrap sample from all observations. A prediction made for an observation in the original data set using only base learners not trained on this particular observation is called out-of-bag (OOB) prediction. These predictions are not prone to overfitting, as each prediction is only made by learners that did not use the observation for training.
To get a list of learners that provide OOB predictions, you can call
listLearners(obj = NA, properties = "oobpreds").
listLearners(obj = NA, properties = "oobpreds")[c("class", "package")] #> Warning in listLearners.character(obj = NA_character_, properties, quiet, : The following learners could not be constructed, probably because their packages are not installed: #> classif.hdrda,classif.mxff #> Check ?learners to see which packages you need or install mlr with all suggestions. #> class package #> 1 classif.randomForest randomForest #> 2 classif.randomForestSRC randomForestSRC #> 3 classif.ranger ranger #> 4 classif.rFerns rFerns #> 5 regr.randomForest randomForest #> 6 regr.randomForestSRC randomForestSRC #> ... (8 rows, 2 cols)
In mlr function getOOBPreds can be used to extract these observations from the trained models. These predictions can be used to evaluate the performance of a given learner like in the following example.
lrn = makeLearner("classif.ranger", predict.type = "prob", predict.threshold = 0.6) mod = train(lrn, sonar.task) oob = getOOBPreds(mod, sonar.task) oob #> Prediction: 208 observations #> predict.type: prob #> threshold: M=0.60,R=0.40 #> time: NA #> id truth prob.M prob.R response #> 1 1 R 0.5373385 0.4626615 R #> 2 2 R 0.5971972 0.4028028 R #> 3 3 R 0.5626560 0.4373440 R #> 4 4 R 0.4319901 0.5680099 R #> 5 5 R 0.5417589 0.4582411 R #> 6 6 R 0.4005787 0.5994213 R #> ... (208 rows, 5 cols) performance(oob, measures = list(auc, mmce)) #> auc mmce #> 0.9308071 0.1778846
As the predictions that are used are out-of-bag, this evaluation strategy is very similar to common resampling strategies like 10-fold cross-validation, but much faster, as only one training instance of the model is required.