mlr-org: mlr Workshop 2021 Recap

Patrick Schratz

This blog post is a recap of the mlr-org workshop 2021 which took place from the 28th of September until the 10th of October.

First of all, we would like to thank all people and organizations which made this workshop possible:

our (19) GitHub sponsors ⭐️
the Statistical Learning and Data Science group of LMU Munich
cynkra
Essential Data Science

If you like our work and want to see more of such workshops where we are able to devot our full time to mlr3, consider becoming a GitHub sponsor ❤️

mlr3

For our core package {mlr3} we mainly focused on maintenance and issue curation. A little new goodie relates to scoring and aggregating: one can now conduct more complex benchmarks using different evaluation metics.

mlr3pipelines

{mlr3pipelines} has also seen a lot of maintenance in the first place. In addition, we did some code profiling and were able to improve the performance a bit by reducing the overhead when cloning Paramsets and learners. This comes with the new %>>!% operator which concatenetes partial Graphs in place and is essentially carrying the memory and speed improvements from above.

A new sugar function pos() now exists, making it possible to initialize multiple pipe operators more easily.

We also started working on adding more sugar to {paradox} (e.g. trafos) which should make life easier in {mlr3pipelines} in the long run.

Last, we overwhauled some error messages internally to make them more descriptive for users.

Tuning

mlr3tuning / bbotk / mlr3mbo / mlr3hyperband

mlr3hyperband

One focus with respect to tuning was on “hot starting” learners. This means that learners can save their tuning state at an arbitrary point and can be retrained later on starting from this point. This means that tuning can be done in decoupled chunks instead of one big task. This is especially helpful for certain tuning methods like “hyperband”, which is why we have listed this feature in this section.

Proposal

bbotk

Another improvement for {bbotk} and all tuning packages was the support for asynchronous evaluations of hyperparameter configurations. Before, one had to wait until all hyperband configurations were evaluated to propose a new point. Now, new points can be evaluated at any time, making more efficient use of parallelization. This also avoids the issue of waiting on a few slow workers with an ineffecient configuriation that takes a long time to finish.

mlr3mbo

We made great progress in finalizing {mlr3mbo}. {mlr3mbo} is a flexible Bayesian optimization toolkit and much more modular than its predecessor {mlrMBO}. Most of the functionality {mlrMBO} offers is already integrated, and we are looking forward to a CRAN release in the near future - in the meantime make sure to check out the package on GitHub.

mlr3learners

For our curated set of learners we updated some ParamSet to the most recent versions and ensured that our custom “Paramtest”, i.e. the heuristic how we validate the ParamSets of mlr3learners against their upstream packages, works again smoothly.

mlr3fairness

{mlr3} now allows for bias auditing and debiasing through mlr3fairness for arbitrary learners. The package contains Fairness Metrics, Debiasing Strategies and Visualizations for arbitrary models trained through {mlr3}. We plan a to realease on CRAN soon, but the package is already usable. If you wanna learn more, check out the tutorials for the individual components:

mlr3viz

We created a new ggplot2 theme theme_mlr3() which is being applied by default to all plots created with {mlr3viz}, i.e. all autoplot() methods in mlr3. This theme is heavily inspired by ggpubr::theme_pubr(). In addition to the theme, all mlr3viz plots now use the viridis color palette by default. Last, we have added some information how users can easily apply theming changes to the plots returned by autoplot().

theme_mlr3() example plot — `theme_mlr3()` example plot

mlr3spatial

mlr3spatial is a new package for spatial backends able to handle {sf}, {stars} and {terra} objects. It is capable of performing predictions on spatial raster objects in parallel using the internal mlr3 parallelization heuristic (which uses the {future} framework under the hood). Together with the {mlr3spatiotempcv} package it extends the support for spatial machine learning in mlr3.

mlr-org meta

Roadmap

We have createad a Roadmap of upcoming packages and planned features across the whole mlr3 ecosystem. This roadmap aims both for internal organization and giving external people an idea of what is upcoming. The roadmap is quite new and not yet fully operational at the time this blog post got released - however, we encourage to look at it from time to time as we try to keep it up-to-date.

mlr3 wiki

We have updated and reorganized our mlr3 wiki. It now has a better sidebar appearance for easier navigation and we have updated the sections with respect to their content.

mlr3book

Text and structure improvements (mainly for the basics chapter)
New introduction

mlr Workshop 2021 Recap

Tuning

mlr-org meta

Citation