With this post I want to show you how to benchmark several learners (or learners with different parameter settings) using several data sets in a structured and parallelized fashion.
For this we want to use batchtools.
The data that we will use here is stored on the open machine learning platform openml.org and we can download it together with information on what to do with it in form of a task.
If you have a small project and don’t need to parallelize, you might want to just look at the previous blog post called mlr loves OpenML.
The following packages are needed for this:
Now we download five OpenML-tasks from OpenML:
In a next step we need to create the so called registry.
What this basically does is to create a folder with a certain subfolder structure.
Now you should have a new folder in your working directory with the name parallel_benchmarking_blogpost and the following subfolders / files:
In the next step we get to the interesting point.
We need to define…
the problems, which in our case are simply the OpenML tasks we downloaded.
the algorithm, which with mlr and OpenML is quite simply achieved using makeLearner and runTaskMlr.
We do not have to save the run results (result of applying the learner to the task), but we can directly upload it to OpenML where the results are automatically evaluated.
the machine learning experiment, i.e. in our case which parameters do we want to set for which learner.
As an example here, we will look at the ctree algorithm from the party package and see whether Bonferroni correction (correction for multiple testing) helps getting better predictions and also we want to check whether we need a tree that has more than two leaf nodes (stump = FALSE) or if a small tree is enough (stump = TRUE).
Now we can simply run our experiment:
While your job is running, you can check the progress using getStatus().
As soon as getStatus() tells us that all our runs are done, we can collect the results of our experiment from OpenML.
To be able to do this we need to collect the run IDs from the uploaded runs we did during the experiment.
Also we want to add the info of the parameters used (getJobPars()).
With the run ID information we can now grab the evaluations from OpenML and plot for example the parameter settings against the predictive accuracy.
We see that the only data set where a stump is good enough is the pc1 data set.
For the madelon data set Bonferroni correction helps.
For the others it does not seem to matter.
You can check out the results online by going to the task websites (e.g. for task 9976 for the madelon data set go to openml.org/t/9976) or the run websites (e.g. openml.org/r/1852889).