Multilabel Classification with mlr

Multilabel classification has lately gained growing interest in the research community. We implemented several methods, which make use of the standardized mlr framework. Every available binary learner can be used for multilabel problem transformation methods. So if you’re interested in using several multilabel algorithms and want to know how to use them in the mlr framework, then this post is for you!

New mlr Logo

We at mlr are currently deciding on a new logo, and in the spirit of open-source, we would like to involve the community in the voting process!

You can vote for your favorite logo on GitHub by reacting to the logo with a +1.

Thanks to Hannah Atkin for designing the logos!

Use mlrMBO to optimize via command line

Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable.

We recently published mlrMBO on CRAN. As a normal package it normally operates inside of R, but with this post I want to demonstrate how mlrMBO can be used to optimize an external application. At the same time I will highlight some issues you can likely run into.

Parallel benchmarking with OpenML and mlr

With this post I want to show you how to benchmark several learners (or learners with different parameter settings) using several data sets in a structured and parallelized fashion. For this we want to use batchtools.

The data that we will use here is stored on the open machine learning platform openml.org and we can download it together with information on what to do with it in form of a task.

First release of mlrMBO - the toolbox for (Bayesian) Black-Box Optimization

We are happy to finally announce the first release of mlrMBO on cran after a quite long development time. For the theoretical background and a nearly complete overview of mlrMBOs capabilities you can check our paper on mlrMBO that we presubmitted to arxiv.

The key features of mlrMBO are:

• Global optimization of expensive Black-Box functions.
• Multi-Criteria Optimization.
• Parallelization through multi-point proposals.
• Support for optimization over categorical variables using random forests as a surrogate.

For examples covering different scenarios we have Vignettes that are also available as an online documentation. For mlr users mlrMBO is especially interesting for hyperparameter optimization.

Being successful on Kaggle using mlr

Achieving a good score on a Kaggle competition is typically quite difficult. This blog post outlines 7 tips for beginners to improve their ranking on the Kaggle leaderboards. For this purpose, I also created a Kernel for the Kaggle bike sharing competition that shows how the R package, mlr, can be used to tune a xgboost model with random search in parallel (using 16 cores). The R script scores rank 90 (of 3251) on the Kaggle leaderboard.

7 Rules

1. Use good software
2. Understand the objective
3. Create and select features
6. Ensemble different models

OpenML tutorial at useR!2017 Brussels

What is OpenML?

Conducting research openly and reproducibly is becoming the gold standard in academic research. Practicing open and reproducible research, however, is hard. OpenML.org (Open Machine Learning) is an online platform that aims at making the part of research involving data and analyses easier. It automatically connects data sets, research tasks, algorithms, analyses and results and allows users to access all components including meta information through a REST API in a machine readable and standardized format. Everyone can see, work with and expand other people’s work in a fully reproducible way.

The useR Tutorial

At useR!2017, we will we will present an R package to interface the OpenML platform and illustrate its usage both as a stand-alone package and in combination with the mlr machine learning package. Furthermore, we show how the OpenML package allows R users to easily search, download and upload machine learning datasets.

mlr Workshop 2017

When and Where?

In 2017, we are hosting the workshop at LMU Munich. The workshop will run from 6 March to 10 March 2017 (potentially including the sunday before and the saturday at the end), hosted by the Ludwig-Maximilians-University Munich.

Important Dates:

It is also possible to arrive on Saturday or Sunday, as we already have the rooms and are able to work there. But this is totally optional and the official workshop starts on Monday. Same thing for the Saturday after the workshop.

mlr 2.10

mlr 2.10 is now on CRAN. Please update your package if you haven’t done so in a while.

Here is an overview of the changes:

Paper published: mlr - Machine Learning in R

We are happy to announce that we can finally answer the question on how to cite mlr properly in publications.

Our paper on mlr has been published in the open-access Journal of Machine Learning Research (JMLR) and can be downloaded on the journal home page.

mlr loves OpenML

OpenML stands for Open Machine Learning and is an online platform, which aims at supporting collaborative machine learning online. It is an Open Science project that allows its users to share data, code and machine learning experiments.

At the time of writing this blog I am in Eindoven at an OpenML workshop, where developers and scientists meet to work on improving the project. Some of these people are R users and they (we) are developing an R package that communicates with the OpenML platform.

How to win a drone in 20 lines of R code

Or a less clickbaity title: Model based optimization of machine learning models with mlr and mlrMBO.

I recently participated in the #TEFDataChallenge a datathon organized by Wayra. The first price was a drone for every team member, which is a pretty awesome price.

So what exactly is a datathon?

Exploring and Understanding Hyperparameter Tuning

Learners use hyperparameters to achieve better performance on particular datasets. When we use a machine learning package to choose the best hyperparmeters, the relationship between changing the hyperparameter and performance might not be obvious. mlr provides several new implementations to better understand what happens when we tune hyperparameters and to help us optimize our choice of hyperparameters.

Result of the mlr summer workshop in Palermo

The mlr developer team is quite international: Germany, USA, Canada. The time difference between these countries sometimes makes it hard to communicate and develop new features.

The idea for this workshop or sprint was to have the possibility to talk about the project status, future and structure, exterminate imperishable bugs and start developing some fancy features.

Exploring Learner Predictions with Partial Dependence and Functional ANOVA

Learners use features to make predictions but how those features are used is often not apparent. mlr can estimate the dependence of a learned function on a subset of the feature space using generatePartialDependenceData.

In this post I want to shortly introduce you to the great visualization possibilities of mlr. Within the last months a lot of work has been put into that field. This post is not a tutorial but more a demonstration of how little code you have to write with mlr to get some nice plots showing the prediction behaviors for different learners.