tuneRanger: RF Hyperparameter Optimization
- tuneRanger is an R package that automates the tuning of Random Forest hyperparameters using sequential model-based optimization with efficient OOB error evaluation.
- It focuses on the key parameters mtry, sample.fraction, and min.node.size, integrating seamlessly with the ranger backend and mlr framework.
- Benchmark studies show that tuneRanger balances accuracy and computational cost effectively, outperforming several alternatives in both speed and consistency.
tuneRanger is an R package providing automated hyperparameter optimization for the Random Forest algorithm via sequential model-based optimization (SMBO), specifically using the fast C++/R implementation "ranger" as backend. It is designed to tune the most influential hyperparameters of Random Forests for both classification and regression problems, leveraging efficient out-of-bag (OOB) error estimation and offering extensive integration with the "mlr" R machine-learning framework. The methodology, default settings, and benchmarking of tuneRanger were introduced and analyzed by Probst, Wright, and Boulesteix (2018) (Probst et al., 2018).
1. Core Functionality and Design
tuneRanger serves as a wrapper around the "ranger" implementation, automating the process of hyperparameter selection using SMBO, also known as Bayesian optimization. By default, tuneRanger evaluates candidate hyperparameter settings using OOB predictions rather than cross-validation, greatly accelerating the tuning process. The package supports an array of approximately 50 performance measures from "mlr"—including error rate, AUC, Brier score, log-loss, and MSE—and returns a re-trained "ranger" model using the recommended hyperparameters.
The function estimateTimeTuneRanger() allows users to predict total expected tuning time via a preliminary single-forest run.
2. Default Hyperparameters and Search Spaces
By default, tuneRanger targets the following three hyperparameters, which literature and empirical results indicate as most influential on Random Forest performance:
| Hyperparameter | Ranger Default | tuneRanger Search Range | Sampling Method |
|---|---|---|---|
| mtry | ⌊√p⌋ (classification)<br>p/3 (regression) | Integers in [1, p] | Uniform in [0,p], rounded |
| sample.fraction | 1.0 (draw n obs. w/ replacement) | [0.2, 0.9] fractions of n | Uniform in [0.2, 0.9] |
| min.node.size | 1 (classification)<br\>5 (regression) | {1, …, ⌈0.2n⌉} | ,<br> |
Other hyperparameters (replace, splitrule, num.random.splits, max.depth) may be included for tuning via the tune.parameters argument, but are excluded by default.
3. Sequential Model-Based Optimization Workflow
tuneRanger employs the SMBO framework provided by "mlrMBO," with the following components:
- Surrogate Model: Kriging (Gaussian process regression).
- Acquisition Function: Expected Improvement (EI).
- Initial Design: 30 random draws (Latin Hypercube Sampling or uniform sampling).
- Optimization Iterations: 70 sequential SMBO steps (modulable via iters.warmup and iters).
- Termination: Fixed evaluation budget (number of initial + sequential steps).
The SMBO procedure operates as follows:
- Define the performance measure (e.g., OOB Brier score) over the hyperparameter box .
- Conduct initial evaluations of random hyperparameter settings θ using OOB predictions.
- For iterations:
- Fit a Kriging model to all evaluated configurations.
- Select EI(; , ), where is the predictive standard deviation of the surrogate model.
- Evaluate via OOB error and update the history.
- Aggregate the best 5% of all evaluated hyperparameter sets (by performance), average their values (rounding mtry and min.node.size), and adopt this median as the final recommendation.
- Retrain "ranger" on the full dataset with the recommended hyperparameters, returning the fitted model.
This process is fully parallelizable for large datasets and exploits the efficiency of OOB estimation.
4. Implementation and API Usage
Installation is performed via:
1 |
install.packages("tuneRanger") |
A typical workflow for classification tasks is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
library(mlr)
library(tuneRanger)
task = makeClassifTask(data=your.data, target="y")
estimateTimeTuneRanger(task, num.threads=4)
set.seed(123)
res = tuneRanger(task,
measure=list(multiclass.brier),
num.trees=500,
num.threads=4,
iters.warmup=30,
iters=70)
res%%%%13%%%%mtry, %%%%14%%%%sample.fraction
res$y # Best OOB performance
res$exec.time # Runtime
ranger.model = res$model
predict(ranger.model, newdata=…) |
Integration as an "mlr" learner allows direct use within "mlr" resampling and benchmarking:
1 |
lrn = makeLearner("classif.tuneRanger", num.trees=1000, predict.type="prob") |
5. Benchmark Comparison and Empirical Results
A comprehensive benchmark evaluated tuneRanger and four alternative tuners on 39 binary classification datasets from the OpenML100 suite (datasets without missing values). All tests used 2,000 trees per forest, parallelized across 10 CPU cores. The main comparators included:
- tuneRanger (multiple measures: MMCE, AUC, Brier, log-loss)
- mlrHyperopt (SMBO via mlrMBO, tuning mtry & node size, 25 iters, 10-fold CV)
- caret (grid on mtry, 3 points, 25 bootstraps)
- tuneRF (greedy OOB, mtry only)
- ranger default (no tuning)
The main findings were:
| Method | MMCE (avg) | AUC (avg) | Mean Train Time (s) |
|---|---|---|---|
| ranger (default) | ~0.105 | ~0.913 | ~4 |
| tuneRanger (MMCE/AUC) | ~0.092/0.920 | 823–967 | |
| mlrHyperopt | ~0.093 | ~0.920 | ~2,713 |
| caret | ~0.097 | ~0.919 | ~1,216 |
| tuneRF | ~0.094 | ~0.917 | ~863 |
tuneRanger ranked 1st or 2nd in MMCE and AUC, outperforming caret and tuneRF both in accuracy and consistency. mlrHyperopt produced similar accuracy but was substantially slower, attributable to using 10-fold cross-validation rather than OOB estimation. Brier score and log-loss trends mirrored MMCE and AUC.
A plausible implication is that OOB-based SMBO, as implemented in tuneRanger, offers an efficient trade-off between accuracy and computational cost.
6. Authors’ Recommendations and Best Practices
Key recommendations from Probst, Wright, and Boulesteix include:
- Use ≥1,000 (preferably ≥2,000) trees to stabilize predictions and variable-importance metrics.
- Focus tuning on mtry, sample.fraction, and min.node.size; other hyperparameters have limited impact.
- Prefer OOB evaluation for tuning speed; use cross-validation only if potential OOB bias is suspected (e.g., very small n, high p, or balanced class difficulties).
- Retain default tuning parameters (30 initial + 70 SMBO steps; Kriging + EI) for robust results, except under critical time constraints.
- For the fastest, mtry-only tuning, tuneRF (greedy OOB) is ~1.5× faster, but cannot tune the additional parameters.
- Visualize OOB prediction curves to ensure convergence and inspect SMBO logs to monitor search behavior.
- When handling imbalanced data, use AUC, Brier, or log-loss as tuning criteria instead of error rate.
- For variable-importance analyses, employ larger forests (≥5,000 trees) to achieve stable measures; adjust num.trees in tuneRanger accordingly.
7. Context, Limitations, and Extensions
tuneRanger operationalizes SMBO as a standardized, OOB-based hyperparameter optimization procedure for Random Forests in R, offering a practical balance of accuracy and speed. The major limitation is its reliance on the OOB error, which may be biased in specific settings (very small sample size, extremely high p, or balanced classes). The package can be extended to tune additional hyperparameters via the tune.parameters argument if application context warrants.
The significance of tuneRanger lies in its empirical demonstration of OOB-based SMBO's effectiveness and efficiency, its tight integration with "mlr"/"ranger," and its adoption of reproducible, best-practice protocols for hyperparameter optimization in Random Forests (Probst et al., 2018).