Shaped Residual MLP as used in Zimmer et al. Auto Pytorch Tabular (2020) and proposed by https://mikkokotila.github.io/slate. Implements 'Search Space 2' from Zimmer et al. Auto Pytorch Tabular (2020) (https://arxiv.org/abs/2006.13799). Currently some training techniques are missing (Shake-Shake, Shake-Drop, Mixup and SVD).

This learner builds and compiles the keras model from the hyperparameters in param_set, and does not require a supplied and compiled model.

Format

R6::R6Class() inheriting from LearnerRegrKeras.

Details

Calls keras::fit() from package keras. Layers are set up as follows:

  • The inputs are connected to a layer_dropout, applying the input_dropout. Afterwards, each layer_dense() is followed by a layer_activation, and depending on hyperparameters by a layer_batch_normalization and or a layer_dropout depending on the architecture hyperparameters. This is repeated length(layer_units) times, i.e. one 'dense->activation->batchnorm->dropout' block is appended for each layer_unit. The last layer is either 'softmax' or 'sigmoid' for classification or 'linear' or 'sigmoid' for regression.

Parameters:

Most of the parameters can be obtained from the keras documentation. Some exceptions are documented here.

  • use_embedding: A logical flag, should embeddings be used? Either uses make_embedding (if TRUE) or if set to FALSE model.matrix(~. - 1, data) to convert factor, logical and ordered factors into numeric features.

  • n_layers: An integer defining the number of layers of the shaped MLP.

  • n_max: An integer, defining the (first layer) number of neurons. The number of neurons is halved after each layer according to formula (1) in https://arxiv.org/abs/2006.13799.

  • initializer: Weight and bias initializer.

    "glorot_uniform"  : initializer_glorot_uniform(seed)
    "glorot_normal"   : initializer_glorot_normal(seed)
    "he_uniform"      : initializer_he_uniform(seed)
    "..."             : see `??keras::initializer`
    
  • optimizer: Some optimizers and their arguments can be found below.
    Inherits from tensorflow.python.keras.optimizer_v2.

    "sgd"     : optimizer_sgd(lr, momentum, decay = decay),
    "rmsprop" : optimizer_rmsprop(lr, rho, decay = decay),
    "adagrad" : optimizer_adagrad(lr, decay = decay),
    "adam"    : optimizer_adam(lr, beta_1, beta_2, decay = decay),
    "nadam"   : optimizer_nadam(lr, beta_1, beta_2, schedule_decay = decay)
    
  • regularizer: Regularizer for keras layers:

    "l1"      : regularizer_l1(l = 0.01)
    "l2"      : regularizer_l2(l = 0.01)
    "l1_l2"   : regularizer_l1_l2(l1 = 0.01, l2 = 0.01)
    
  • class_weights: needs to be a named list of class-weights for the different classes numbered from 0 to c-1 (for c classes).

    Example:
    wts = c(0.5, 1)
    setNames(as.list(wts), seq_len(length(wts)) - 1)
    
  • callbacks: A list of keras callbacks. See ?callbacks.

Construction

LearnerRegrShapedMLP2$new()
mlr3::mlr_learners$get("regr.smlp2")
mlr3::lrn("regr.smlp2")

Learner Methods

Keras Learners offer several methods for easy access to the stored models.

  • .$plot()
    Plots the history, i.e. the train-validation loss during training.

  • .$save(file_path)
    Dumps the model to a provided file_path in 'h5' format.

  • .$load_model_from_file(file_path)
    Loads a model saved using saved back into the learner. The model needs to be saved separately when the learner is serialized. In this case, the learner can be restored from this function. Currently not implemented for 'TabNet'.

  • .$lr_find(task, epochs, lr_min, lr_max, batch_size)
    Employ an implementation of the learning rate finder as popularized by Jeremy Howard in fast.ai (http://course.fast.ai/) for the learner. For more info on parameters, see find_lr.

See also

Examples

learner = mlr3::lrn("regr.smlp2") print(learner)
#> <LearnerRegrShapedMLP2:regr.keras> #> * Model: - #> * Parameters: epochs=100, validation_split=0.3333, batch_size=128, #> callbacks=<list>, low_memory=FALSE, verbose=0, use_embedding=TRUE, #> embed_dropout=0, embed_size=<NULL>, n_max=128, n_layers=3, #> initializer=<keras.initializers.initializers_v2.GlorotUniform>, #> regularizer=<keras.regularizers.L1L2>, #> optimizer=<keras.optimizer_v2.gradient_descent.SGD>, activation=relu, #> dropout=0, input_dropout=0, loss=mean_squared_error, #> output_activation=linear, metrics=mean_squared_logarithmic_error #> * Packages: keras #> * Predict Type: response #> * Feature types: integer, numeric, factor, ordered #> * Properties: -
# available parameters: learner$param_set$ids()
#> [1] "epochs" "model" "class_weight" #> [4] "validation_split" "batch_size" "callbacks" #> [7] "low_memory" "verbose" "use_embedding" #> [10] "embed_dropout" "embed_size" "n_max" #> [13] "n_layers" "initializer" "regularizer" #> [16] "optimizer" "activation" "dropout" #> [19] "input_dropout" "loss" "output_activation" #> [22] "metrics"