R by default does not make use of parallelization. With the integration of parallelMap into mlr, it becomes easy to activate the parallel computing capabilities already supported by mlr. parallelMap works with all major parallelization backends: local multicore execution using parallel, socket and MPI clusters using snow, makeshift SSH-clusters using BatchJobs and high performance computing clusters (managed by a scheduler like SLURM, Torque/PBS, SGE or LSF) also using BatchJobs.
All you have to do is select a backend by calling one of the parallelStart* functions. The first loop mlr encounters which is marked as parallel executable will be automatically parallelized. It is good practice to call parallelStop at the end of your script.
library("parallelMap") parallelStartSocket(2) #> Starting parallelization in mode=socket with cpus=2. rdesc = makeResampleDesc("CV", iters = 3) r = resample("classif.lda", iris.task, rdesc) #> Exporting objects to slaves for mode socket: .mlr.slave.options #> Mapping in parallel: mode = socket; cpus = 2; elements = 3. #> [Resample] Aggr. Result: mmce.test.mean=0.02 parallelStop() #> Stopped parallelization. All cleaned up.
On Linux or Mac OS X, you may want to use parallelStartMulticore instead.
We offer different parallelization levels for fine grained control over the parallelization.
E.g., if you do not want to parallelize the benchmark function because it has only very
few iterations but want to parallelize the resampling of each learner instead,
you can specifically pass the
"mlr.resample" to the parallelStart*
Currently the following levels are supported:
parallelGetRegisteredLevels() #> mlr: mlr.benchmark, mlr.resample, mlr.selectFeatures, mlr.tuneParams, mlr.ensemble
For further details please see the parallelization documentation page.
Custom learners and parallelization
no applicable method for 'trainLearner' applied to an object of class <my_new_learner>
simply add the following line somewhere after calling parallelStart.