For more information see: Stochastic Gradient Descent with Warm Restarts: https://arxiv.org/abs/1608.03983.
cb_es(monitor = "val_loss", patience = 3L) cb_lr_scheduler_cosine_anneal( eta_max = 0.01, T_max = 10, T_mult = 2, M_mult = 1, eta_min = 0 ) cb_lr_scheduler_exponential_decay() cb_tensorboard() cb_lr_log()
monitor |
|
---|---|
patience |
|
eta_max |
|
T_max |
|
T_mult |
|
M_mult |
|
eta_min |
|
Closed form: \(\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi))\)