Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable.
We recently published mlrMBO on CRAN.
As a normal package it normally operates inside of R, but with this post I want to demonstrate how mlrMBO can be used to optimize an external application.
At the same time I will highlight some issues you can likely run into.
First of all we need a bash script that we want to optimize.
This tutorial will only run on Unix systems (Linux, OSX etc.) but should also be informative for windows users.
The following code will write a tiny bash script that uses bc to calculate $sin(x_1-1) + (x_1^2 + x_2^2)$ and write the result “hidden” in a sentence (The result is 12.34!) in a result.txt text file.
The bash script
Running the script from R
Now we need a R function that starts the script, reads the result from the text file and returns it.
This function uses stringi and regular expressions to match the result within the sentence.
Depending on the output different strategies to read the result make sense.
XML files can usually be accessed with XML::xmlParse, XML::getNodeSet, XML::xmlAttrs etc. using XPath queries.
Sometimes the good old read.table() is also sufficient.
If, for example, the output is written in a file like this:
You can easily use source() like that:
which will return a list with the entries $value1 and $value2.
Define bounds, wrap function.
To evaluate the function from within mlrMBO it has to be wrapped in smoof function.
The smoof function also contains information about the bounds and scales of the domain of the objective function defined in a ParameterSet.
If you run this locally, you will see that the console output generated by our shell script directly appears in the R-console.
This can be helpful but also annoying.
If a lot of output is generated during a single call of system() it might even crash R.
To avoid that I suggest to redirect the output into a file.
This way no output is lost and the R console does not get flooded.
We can simply achieve that by replacing the command in the function runScript from above with the following code:
Start the Optimization
Now everything is set so we can proceed with the usual MBO setup:
Execute the R script from a shell
Also you might not want to bothered having to start R and run this script manually so what I would recommend is saving all above as an R-script plus some lines that write the output in a JSON file like this:
As an extra the script in the gist also contains a simple handler for command line arguments.
In this case you can define the number of optimization iterations and the maximal allowed time in seconds for the optimization.
You can also define the seed to make runs reproducible:
If you want to build a more advanced command line interface you might want to have a lookatdocopt.
To clean up all the files generated by this script you can run: