Most Popular Learners in mlr

Thumbnail of /figures/2017-03-30-Most_Popular_Learners_in_mlr/compressTablePlot-1.svg

For the development of mlr as well as for an “machine learning expert” it can be handy to know what are the most popular learners used. Not necessarily to see, what are the top notch performing methods but to see what is used “out there” in the real world. Thanks to the nice little package cranlogs from metacran you can at least get a slight estimate as I will show in the following…

Read More

Multilabel Classification with mlr

Multilabel classification has lately gained growing interest in the research community. We implemented several methods, which make use of the standardized mlr framework. Every available binary learner can be used for multilabel problem transformation methods. So if you’re interested in using several multilabel algorithms and want to know how to use them in the mlr framework, then this post is for you!

Read More

New mlr Logo

We at mlr are currently deciding on a new logo, and in the spirit of open-source, we would like to involve the community in the voting process!

You can vote for your favorite logo on GitHub by reacting to the logo with a +1.

Thanks to Hannah Atkin for designing the logos!

Read More

Use mlrMBO to optimize via command line

Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable.

We recently published mlrMBO on CRAN. As a normal package it normally operates inside of R, but with this post I want to demonstrate how mlrMBO can be used to optimize an external application. At the same time I will highlight some issues you can likely run into.

Read More

Parallel benchmarking with OpenML and mlr

Thumbnail of /figures/2017-03-22-Parallel_benchmarking_with_OpenML_and_mlr/unnamed-chunk-12-1.svg

With this post I want to show you how to benchmark several learners (or learners with different parameter settings) using several data sets in a structured and parallelized fashion. For this we want to use batchtools.

The data that we will use here is stored on the open machine learning platform openml.org and we can download it together with information on what to do with it in form of a task.

Read More

First release of mlrMBO - the toolbox for (Bayesian) Black-Box Optimization

Thumbnail of /figures/2017-03-13-First_release_of_mlrMBO_the_toolbox_for_Bayesian_Black_Box_Optimization/plotObjectiveFunction-1.svg

We are happy to finally announce the first release of mlrMBO on cran after a quite long development time. For the theoretical background and a nearly complete overview of mlrMBOs capabilities you can check our paper on mlrMBO that we presubmitted to arxiv.

The key features of mlrMBO are:

  • Global optimization of expensive Black-Box functions.
  • Multi-Criteria Optimization.
  • Parallelization through multi-point proposals.
  • Support for optimization over categorical variables using random forests as a surrogate.

For examples covering different scenarios we have Vignettes that are also available as an online documentation. For mlr users mlrMBO is especially interesting for hyperparameter optimization.

Read More

Being successful on Kaggle using `mlr`

Achieving a good score on a Kaggle competition is typically quite difficult. This blog post outlines 7 tips for beginners to improve their ranking on the Kaggle leaderboards. For this purpose, I also created a Kernel for the Kaggle bike sharing competition that shows how the R package, mlr, can be used to tune a xgboost model with random search in parallel (using 16 cores). The R script scores rank 90 (of 3251) on the Kaggle leaderboard.

7 Rules

  1. Use good software
  2. Understand the objective
  3. Create and select features
  4. Tune your model
  5. Validate your model
  6. Ensemble different models
  7. Track your progress
Read More

OpenML tutorial at useR!2017 Brussels

What is OpenML?

Conducting research openly and reproducibly is becoming the gold standard in academic research. Practicing open and reproducible research, however, is hard. OpenML.org (Open Machine Learning) is an online platform that aims at making the part of research involving data and analyses easier. It automatically connects data sets, research tasks, algorithms, analyses and results and allows users to access all components including meta information through a REST API in a machine readable and standardized format. Everyone can see, work with and expand other people’s work in a fully reproducible way.

The useR Tutorial

At useR!2017, we will we will present an R package to interface the OpenML platform and illustrate its usage both as a stand-alone package and in combination with the mlr machine learning package. Furthermore, we show how the OpenML package allows R users to easily search, download and upload machine learning datasets.

Read More

mlr Workshop 2017

When and Where?

In 2017, we are hosting the workshop at LMU Munich. The workshop will run from 6 March to 10 March 2017 (potentially including the sunday before and the saturday at the end), hosted by the Ludwig-Maximilians-University Munich.

Important Dates:

It is also possible to arrive on Saturday or Sunday, as we already have the rooms and are able to work there. But this is totally optional and the official workshop starts on Monday. Same thing for the Saturday after the workshop.

Read More

mlr 2.10

mlr 2.10 is now on CRAN. Please update your package if you haven’t done so in a while.

Here is an overview of the changes:

Read More

mlr loves OpenML

Thumbnail of /images/2016-09-09-mlr-loves-OpenML/mlr_loves_openml.png

OpenML stands for Open Machine Learning and is an online platform, which aims at supporting collaborative machine learning online. It is an Open Science project that allows its users to share data, code and machine learning experiments.

At the time of writing this blog I am in Eindoven at an OpenML workshop, where developers and scientists meet to work on improving the project. Some of these people are R users and they (we) are developing an R package that communicates with the OpenML platform.

Read More

Exploring and Understanding Hyperparameter Tuning

Thumbnail of /figures/2016-08-21-Exploring-and-Understanding-Hyperparameter-Tuning/fifth_chart-1.svg

Learners use hyperparameters to achieve better performance on particular datasets. When we use a machine learning package to choose the best hyperparmeters, the relationship between changing the hyperparameter and performance might not be obvious. mlr provides several new implementations to better understand what happens when we tune hyperparameters and to help us optimize our choice of hyperparameters.

Read More

Result of the mlr summer workshop in Palermo

Thumbnail of ../images/palermo/IMG_20160805_194946.jpg

The mlr developer team is quite international: Germany, USA, Canada. The time difference between these countries sometimes makes it hard to communicate and develop new features.

The idea for this workshop or sprint was to have the possibility to talk about the project status, future and structure, exterminate imperishable bugs and start developing some fancy features.

Read More

Benchmarking mlr (default) learners on OpenML

Thumbnail of /images/2016-08-11-Benchmarking-mlr-learners-on-OpenML/1_best_algo_classif_with_na_rank.png

There are already some benchmarking studies about different classification algorithms out there. The probably most well known and most extensive one is the Do we Need Hundreds of Classifers to Solve Real World Classication Problems? paper. They use different software and also different tuning processes to compare 179 learners on more than 121 datasets, mainly from the UCI site. They exclude different datasets, because their dimension (number of observations or number of features) are too high, they are not in a proper format or because of other reasons. There are also summarized some criticism about the representability of the datasets and the generability of benchmarking results. It remains a bit unclear if their tuning process is done also on the test data or only on the training data (page 3154). They reported the random forest algorithms to be the best one (in general) for multiclass classification datasets and the support vector machine (svm) the second best one. On binary class classification tasks neural networks also perform competitively. They recommend the R library caret for choosing a classifier.

Read More

Visualization of predictions

Thumbnail of /figures/2015-07-28-Visualisation-of-predictions/linear-svm-1.svg

In this post I want to shortly introduce you to the great visualization possibilities of mlr. Within the last months a lot of work has been put into that field. This post is not a tutorial but more a demonstration of how little code you have to write with mlr to get some nice plots showing the prediction behaviors for different learners.

Read More