# Multilabel Classification

Multilabel classification is a classification problem where multiple target labels can be assigned to each observation instead of only one like in multiclass classification.

Two different approaches exist for multilabel classification. Problem transformation methods try to transform the multilabel classification into binary or multiclass classification problems. Algorithm adaptation methods adapt multiclass algorithms so they can be applied directly to the problem.

The first thing you have to do for multilabel classification in mlr is to get your data in the right format. You need a data.frame which consists of the features and a logical vector for each label which indicates if the label is present in the observation or not. After that you can create a MultilabelTask like a normal ClassifTask. Instead of one target name you have to specify a vector of targets which correspond to the names of logical variables in the data.frame. In the following example we get the yeast data frame from the already existing yeast.task, extract the 14 label names and create the task again.

yeast = getTaskData(yeast.task)
labels = colnames(yeast)[1:14]
#> Type: multilabel
#> Target: label1,label2,label3,label4,label5,label6,label7,label8,label9,label10,label11,label12,label13,label14
#> Observations: 2417
#> Features:
#> numerics  factors  ordered
#>      103        0        0
#> Missings: FALSE
#> Has weights: FALSE
#> Has blocking: FALSE
#> Classes: 14
#>  label1  label2  label3  label4  label5  label6  label7  label8  label9
#>     762    1038     983     862     722     597     428     480     178
#> label10 label11 label12 label13 label14
#>     253     289    1816    1799      34


## Constructing a learner

Multilabel classification in mlr can currently be done in two ways:

• Algorithm adaptation methods: Treat the whole problem with a specific algorithm.

• Problem transformation methods: Transform the problem, so that simple binary classification algorithms can be applied.

Currently the available algorithm adaptation methods in R are the multivariate random forest in the randomForestSRC package and the random ferns multilabel algorithm in the rFerns package. You can create the learner for these algorithms like in multiclass classification problems.

lrn.rfsrc = makeLearner("multilabel.randomForestSRC")
lrn.rFerns = makeLearner("multilabel.rFerns")
lrn.rFerns
#> Learner multilabel.rFerns from package rFerns
#> Type: multilabel
#> Name: Random ferns; Short name: rFerns
#> Class: multilabel.rFerns
#> Properties: numerics,factors,ordered
#> Predict-Type: response
#> Hyperparameters:


### Problem transformation methods

For generating a wrapped multilabel learner first create a binary (or multiclass) classification learner with makeLearner. Afterwards apply a function like makeMultilabelBinaryRelevanceWrapper, makeMultilabelClassifierChainsWrapper, makeMultilabelNestedStackingWrapper, makeMultilabelDBRWrapper or makeMultilabelStackingWrapper on the learner to convert it to a learner that uses the respective problem transformation method.

You can also generate a binary relevance learner directly, as you can see in the example.

lrn.br = makeLearner("classif.rpart", predict.type = "prob")
lrn.br = makeMultilabelBinaryRelevanceWrapper(lrn.br)
lrn.br
#> Learner multilabel.classif.rpart from package rpart
#> Type: multilabel
#> Name: ; Short name:
#> Class: MultilabelBinaryRelevanceWrapper
#> Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass
#> Predict-Type: prob
#> Hyperparameters: xval=0

lrn.br2 = makeMultilabelBinaryRelevanceWrapper("classif.rpart")
lrn.br2
#> Learner multilabel.classif.rpart from package rpart
#> Type: multilabel
#> Name: ; Short name:
#> Class: MultilabelBinaryRelevanceWrapper
#> Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass
#> Predict-Type: response
#> Hyperparameters: xval=0


The different methods are shortly described in the following.

#### Binary relevance

This problem transformation method converts the multilabel problem to binary classification problems for each label and applies a simple binary classificator on these. In mlr this can be done by converting your binary learner to a wrapped binary relevance multilabel learner.

#### Classifier chains

Trains consecutively the labels with the input data. The input data in each step is augmented by the already trained labels (with the real observed values). Therefore an order of the labels has to be specified. At prediction time the labels are predicted in the same order as while training. The required labels in the input data are given by the previous done prediction of the respective label.

#### Nested stacking

Same as classifier chains, but the labels in the input data are not the real ones, but estimations of the labels obtained by the already trained learners.

#### Dependent binary relevance

Each label is trained with the real observed values of all other labels. In prediction phase for a label the other necessary labels are obtained in a previous step by a base learner like the binary relevance method.

#### Stacking

Same as the dependent binary relevance method, but in the training phase the labels used as input for each label are obtained by the binary relevance method.

## Train

You can train a model as usual with a multilabel learner and a multilabel task as input. You can also pass subset and weights arguments if the learner supports this.

mod = train(lrn.br, yeast.task)
mod = train(lrn.br, yeast.task, subset = 1:1500, weights = rep(1/1500, 1500))
mod
#> Model for learner.id=multilabel.classif.rpart; learner.class=MultilabelBinaryRelevanceWrapper
#> Trained on: task.id = multi; obs = 1500; features = 103
#> Hyperparameters: xval=0

mod2 = train(lrn.rfsrc, yeast.task, subset = 1:100)
mod2
#> Model for learner.id=multilabel.randomForestSRC; learner.class=multilabel.randomForestSRC
#> Trained on: task.id = multi; obs = 100; features = 103
#> Hyperparameters: na.action=na.impute


## Predict

Prediction can be done as usual in mlr with predict and by passing a trained model and either the task to the task argument or some new data to the newdata argument. As always you can specify a subset of the data which should be predicted.

pred = predict(mod, task = yeast.task, subset = 1:10)
pred = predict(mod, newdata = yeast[1501:1600,])
names(as.data.frame(pred))
#>  [1] "truth.label1"     "truth.label2"     "truth.label3"
#>  [4] "truth.label4"     "truth.label5"     "truth.label6"
#>  [7] "truth.label7"     "truth.label8"     "truth.label9"
#> [10] "truth.label10"    "truth.label11"    "truth.label12"
#> [13] "truth.label13"    "truth.label14"    "prob.label1"
#> [16] "prob.label2"      "prob.label3"      "prob.label4"
#> [19] "prob.label5"      "prob.label6"      "prob.label7"
#> [22] "prob.label8"      "prob.label9"      "prob.label10"
#> [25] "prob.label11"     "prob.label12"     "prob.label13"
#> [28] "prob.label14"     "response.label1"  "response.label2"
#> [31] "response.label3"  "response.label4"  "response.label5"
#> [34] "response.label6"  "response.label7"  "response.label8"
#> [37] "response.label9"  "response.label10" "response.label11"
#> [40] "response.label12" "response.label13" "response.label14"

names(as.data.frame(pred2))
#>  [1] "id"               "truth.label1"     "truth.label2"
#>  [4] "truth.label3"     "truth.label4"     "truth.label5"
#>  [7] "truth.label6"     "truth.label7"     "truth.label8"
#> [10] "truth.label9"     "truth.label10"    "truth.label11"
#> [13] "truth.label12"    "truth.label13"    "truth.label14"
#> [16] "response.label1"  "response.label2"  "response.label3"
#> [19] "response.label4"  "response.label5"  "response.label6"
#> [22] "response.label7"  "response.label8"  "response.label9"
#> [25] "response.label10" "response.label11" "response.label12"
#> [28] "response.label13" "response.label14"


Depending on the chosen predict.type of the learner you get true and predicted values and possibly probabilities for each class label. These can be extracted by the usual accessor functions getPredictionTruth, getPredictionResponse and getPredictionProbabilities.

## Performance

The performance of your prediction can be assessed via function performance. You can specify via the measures argument which measure(s) to calculate. The default measure for multilabel classification is the Hamming loss (multilabel.hamloss). All available measures for multilabel classification can be shown by listMeasures and found in the table of performance measures and the measures documentation page.

performance(pred)
#> multilabel.hamloss
#>          0.2257143

performance(pred2, measures = list(multilabel.subset01, multilabel.hamloss, multilabel.acc,
multilabel.f1, timepredict))
#> multilabel.subset01  multilabel.hamloss      multilabel.acc
#>           0.8663633           0.2049471           0.4637509
#>       multilabel.f1         timepredict
#>           0.5729926           3.0520000

listMeasures("multilabel")
#>  [1] "multilabel.f1"       "multilabel.subset01" "multilabel.tpr"
#>  [4] "multilabel.ppv"      "multilabel.acc"      "timeboth"
#>  [7] "timepredict"         "multilabel.hamloss"  "featperc"
#> [10] "timetrain"


## Resampling

For evaluating the overall performance of the learning algorithm you can do some resampling. As usual you have to define a resampling strategy, either via makeResampleDesc or makeResampleInstance. After that you can run the resample function. Below the default measure Hamming loss is calculated.

rdesc = makeResampleDesc(method = "CV", stratify = FALSE, iters = 3)
r = resample(learner = lrn.br, task = yeast.task, resampling = rdesc, show.info = FALSE)
r
#> Resample Result
#> Learner: multilabel.classif.rpart
#> Aggr perf: multilabel.hamloss.test.mean=0.225
#> Runtime: 11.9598

r = resample(learner = lrn.rFerns, task = yeast.task, resampling = rdesc, show.info = FALSE)
r
#> Resample Result
#> Learner: multilabel.rFerns
#> Aggr perf: multilabel.hamloss.test.mean=0.473
#> Runtime: 0.896346


## Binary performance

If you want to calculate a binary performance measure like, e.g., the accuracy, the mmce or the auc for each label, you can use function getMultilabelBinaryPerformances. You can apply this function to any multilabel prediction, e.g., also on the resample multilabel prediction. For calculating the auc you need predicted probabilities.

getMultilabelBinaryPerformances(pred, measures = list(acc, mmce, auc))
#>         acc.test.mean mmce.test.mean auc.test.mean
#> label1           0.75           0.25     0.6321925
#> label2           0.64           0.36     0.6547917
#> label3           0.68           0.32     0.7118227
#> label4           0.69           0.31     0.6764835
#> label5           0.73           0.27     0.6676923
#> label6           0.70           0.30     0.6417739
#> label7           0.81           0.19     0.5968750
#> label8           0.73           0.27     0.5164474
#> label9           0.89           0.11     0.4688458
#> label10          0.86           0.14     0.3996463
#> label11          0.85           0.15     0.5000000
#> label12          0.76           0.24     0.5330667
#> label13          0.75           0.25     0.5938610
#> label14          1.00           0.00            NA

getMultilabelBinaryPerformances(r\$pred, measures = list(acc, mmce))
#>         acc.test.mean mmce.test.mean
#> label1     0.69383533      0.3061647
#> label2     0.58254034      0.4174597
#> label3     0.70211005      0.2978899
#> label4     0.71369466      0.2863053
#> label5     0.70831609      0.2916839
#> label6     0.60488209      0.3951179
#> label7     0.54447662      0.4555234
#> label8     0.53289201      0.4671080
#> label9     0.30906082      0.6909392
#> label10    0.44683492      0.5531651
#> label11    0.45676458      0.5432354
#> label12    0.52916839      0.4708316
#> label13    0.53702938      0.4629706
#> label14    0.01406703      0.9859330