Training Efficiency

Question 1: Validation

In this exercise, we will once again train a simple multi-layer perceptron on the Indian Liver Patient Dataset (ILPD). Create a learner that:

  1. Uses 2 hidden layers with 100 neurons each.
  2. Utilizes a batch size of 128.
  3. Trains for 200 epochs.
  4. Employs a validation set comprising 30% of the data.
  5. Track the validation log-loss.
  6. Utilizes trace-jitting to speed up the training process.
  7. Employs the history callback to record the training and validation log-loss during training.

Afterward, plot the validation log-loss, which is accessible via learner$model$callbacks$history.

Below, we create the task and remove the gender feature again for simplicity.

library(mlr3verse)
library(mlr3torch)
ilpd_num <- tsk("ilpd")
ilpd_num$select(setdiff(ilpd_num$feature_names, "gender"))
ilpd_num
<TaskClassif:ilpd> (583 x 10): Indian Liver Patient Data
* Target: diseased
* Properties: twoclass
* Features (9):
  - dbl (5): albumin, albumin_globulin_ratio, direct_bilirubin, total_bilirubin, total_protein
  - int (4): age, alanine_transaminase, alkaline_phosphatase, aspartate_transaminase

Question 2: Early Stopping

Enable early stopping to prevent overfitting and re-train the learner (using a patience of 10). Print the final validation performance of the learner and the early stopped results. You can consult the documentation of LearnerTorch on how to access these (see section Active Bindings).

Hint You can enable early stopping by setting the patience parameter.

Question 3: Early Stopping and Dropout Tuning

While early stopping in itself is already useful, mlr3torch also allows you to simultaneously tune the number of epochs using early stopping while tuning other hyperparameters via traditional hyperparameter tuning from mlr3tuning.

One thing we have not mentioned so far is that the MLP learner also uses a dropout layer. The dropout probability can be configured via the p parameter.

Your task is to tune the dropout probability p in the range \([0, 1]\) and the epochs using early stopping (using the configuration from the previous exercise) with an upper bound of 100 epochs.

To adapt this to work with early stopping, you need to set the:

  1. epochs to to_tune(upper = <value>, internal = TRUE): This tells the Tuner that the learner will tune the number of epochs itself.
  2. $validate field of the "test" so the same data is used for tuning and validation.
  3. Tuning measure to msr("internal_valid_score", minimize = TRUE). We set minimize to TRUE because we have used the log-loss as a validation measure.

Apart from this, the tuning works just like in tutorial 5. Use 3-fold cross-validation and evaluate 10 configurations using random search. Finally, print the optimal configuration.