Training Neural Networks with mlr3torch

Question 1: Hello World!

In this exercise, you will train your first neural network with mlr3torch.

As a task, we will use the ‘Indian Liver Patient’ dataset where the goal is to predict whether a patient has liver disease or not.

library(mlr3verse)
library(mlr3torch)
ilpd <- tsk("ilpd")
ilpd

<TaskClassif:ilpd> (583 x 11): Indian Liver Patient Data
* Target: diseased
* Properties: twoclass
* Features (10):
  - dbl (5): albumin, albumin_globulin_ratio, direct_bilirubin, total_bilirubin, total_protein
  - int (4): age, alanine_transaminase, alkaline_phosphatase, aspartate_transaminase
  - fct (1): gender

autoplot(ilpd)

We remove the gender column from the task, so we need to only deal with numeric features for now.

ilpd_num = ilpd$clone(deep = TRUE)
ilpd_num$select(setdiff(ilpd_num$feature_names, "gender"))

Your task is to train a simple multi layer perceptron (lrn("classif.mlp")) with 2 hidden layers with 100 neurons each. Set the batch size to 32, the learning rate to 0.001 and the number of epochs to 20. Then, resample the learner on the task with a cross-validation with 5 folds and evaluate the results using classification error and false positive rate (FPR). Is the result good?

Hint

The parameter for the learning rate is opt.lr

mlp <- lrn("classif.mlp",
  neurons = c(100, 100),
  batch_size = 32,
  epochs = 20,
  opt.lr = 0.001
)
cv10 <- rsmp("cv", folds = 10)
rr1 <- resample(task = ilpd_num, learner = mlp, resampling = cv10)
rr1$aggregate(msrs(c("classif.ce", "classif.fpr")))

 classif.ce classif.fpr 
  0.3706312   0.5849387

While the classification error is low, this is not a good measure due to the imbalanced class distribution. This is confirmed by the FPR, which is relatively high.

Question 2: Preprocessing

In the previous question, we have operated on the ilpd_num task where we excluded the categorical gender column. This was done because the MLP learner operates on numeric features only. You will now create a more complex GraphLearner that also incudes one-hot encoding of the gender column before applying the MLP. Resample this learner on the original ilpd task and evaluate the results using the same measures as before.

Hint

Concatenate po("encode") with a lrn("classif.mlp") using %>>% to create the GraphLearner. For available options on the encoding, see po("encode")$help().

encoder <- po("encode", method = "one-hot")
glrn <- as_learner(encoder %>>% mlp)
rr2 <- resample(ilpd, glrn, cv10)
rr2$aggregate(msrs(c("classif.ce", "classif.fpr")))

 classif.ce classif.fpr 
  0.3137347   0.8365021

Question 3: Benchmarking

Instead of resampling a single learner, the goal is now to compare the performance of the MLP with a simple classification tree. Create a benchmark design and compare the performance of the two learners.

Hint

Create a classification tree via lrn("classif.rpart"). A benchmark design can be created via benchmark_grid(). To run a benchmark, pass the design to benchmark().

design <- benchmark_grid(
  task = ilpd,
  learner = list(glrn, lrn("classif.rpart", predict_type = "prob")),
  resampling = cv10
)
bmr <- benchmark(design)
bmr$aggregate(msrs(c("classif.ce", "classif.tpr")))

      nr task_id         learner_id resampling_id iters classif.ce classif.tpr
   <int>  <char>             <char>        <char> <int>      <num>       <num>
1:     1    ilpd encode.classif.mlp            cv    10  0.2812683   0.9145299
2:     2    ilpd      classif.rpart            cv    10  0.3035652   0.8301628
Hidden columns: resample_result

Question 4: Iris as a Lazy Tensor

Create a version of the iris task where the 4 features are represented as a single lazy tensor column.

X <- as.matrix(iris[, 1:4])
dt_class <- dataset("iris",
  initialize = function(x) {
    self$x <- torch_tensor(x)
  },
  .getbatch = function(index) {
    list(x = self$x[index, , drop = FALSE])
  },
  .length = function() {
    nrow(self$x)
  }
)
dt <- dt_class(X)
lt <- as_lazy_tensor(dt, dataset_shapes = list(x = c(NA, 4)))
dt <- data.table(
  x = lt,
  Species = iris$Species
)
task <- as_task_classif(dt, target = "Species")
task

<TaskClassif:dt> (150 x 2)
* Target: Species
* Properties: multiclass
* Features (1):
  - lt (1): x

Question 5: Custom Architecture

Create a network with one hidden layer with 100 neurons and a sigmoid activation function by assempling PipeOps in a Graph. Convert the Graph to a Learner and train the network for 10 epochs using Adam with a learning rate of 0.001 and a batch size of 32.

graph <- po("torch_ingress_ltnsr") %>>%
  po("nn_linear", out_features = 100) %>>%
  po("nn_sigmoid") %>>%
  po("nn_head") %>>%
  po("torch_loss", t_loss("cross_entropy")) %>>%
  po("torch_optimizer", optimizer = t_opt("adam"), lr = 0.001) %>>%
  po("torch_model_classif", batch_size = 32, epochs = 10)

glrn <- as_learner(graph)
glrn$train(task)