Papers
Topics
Authors
Recent
2000 character limit reached

tuneRanger: RF Hyperparameter Optimization

Updated 24 November 2025
  • tuneRanger is an R package that automates the tuning of Random Forest hyperparameters using sequential model-based optimization with efficient OOB error evaluation.
  • It focuses on the key parameters mtry, sample.fraction, and min.node.size, integrating seamlessly with the ranger backend and mlr framework.
  • Benchmark studies show that tuneRanger balances accuracy and computational cost effectively, outperforming several alternatives in both speed and consistency.

tuneRanger is an R package providing automated hyperparameter optimization for the Random Forest algorithm via sequential model-based optimization (SMBO), specifically using the fast C++/R implementation "ranger" as backend. It is designed to tune the most influential hyperparameters of Random Forests for both classification and regression problems, leveraging efficient out-of-bag (OOB) error estimation and offering extensive integration with the "mlr" R machine-learning framework. The methodology, default settings, and benchmarking of tuneRanger were introduced and analyzed by Probst, Wright, and Boulesteix (2018) (Probst et al., 2018).

1. Core Functionality and Design

tuneRanger serves as a wrapper around the "ranger" implementation, automating the process of hyperparameter selection using SMBO, also known as Bayesian optimization. By default, tuneRanger evaluates candidate hyperparameter settings using OOB predictions rather than cross-validation, greatly accelerating the tuning process. The package supports an array of approximately 50 performance measures from "mlr"—including error rate, AUC, Brier score, log-loss, and MSE—and returns a re-trained "ranger" model using the recommended hyperparameters.

The function estimateTimeTuneRanger() allows users to predict total expected tuning time via a preliminary single-forest run.

2. Default Hyperparameters and Search Spaces

By default, tuneRanger targets the following three hyperparameters, which literature and empirical results indicate as most influential on Random Forest performance:

Hyperparameter Ranger Default tuneRanger Search Range Sampling Method
mtry ⌊√p⌋ (classification)<br>p/3 (regression) Integers in [1, p] Uniform in [0,p], rounded
sample.fraction 1.0 (draw n obs. w/ replacement) [0.2, 0.9] fractions of n Uniform in [0.2, 0.9]
min.node.size 1 (classification)<br\>5 (regression) {1, …, ⌈0.2n⌉} uUniform(0,1)u \sim \mathrm{Uniform}(0,1),<br>candidate=round((0.2n)u)\text{candidate} = \mathrm{round}((0.2n)^u)

Other hyperparameters (replace, splitrule, num.random.splits, max.depth) may be included for tuning via the tune.parameters argument, but are excluded by default.

3. Sequential Model-Based Optimization Workflow

tuneRanger employs the SMBO framework provided by "mlrMBO," with the following components:

  • Surrogate Model: Kriging (Gaussian process regression).
  • Acquisition Function: Expected Improvement (EI).
  • Initial Design: 30 random draws (Latin Hypercube Sampling or uniform sampling).
  • Optimization Iterations: 70 sequential SMBO steps (modulable via iters.warmup and iters).
  • Termination: Fixed evaluation budget (number of initial + sequential steps).

The SMBO procedure operates as follows:

  1. Define the performance measure μ(θ)\mu(\theta) (e.g., OOB Brier score) over the hyperparameter box Θ\Theta.
  2. Conduct N0=30N_0 = 30 initial evaluations of random hyperparameter settings θ using OOB predictions.
  3. For t=1,,70t=1, \ldots, 70 iterations:
    • Fit a Kriging model y^(θ)μ(θ)\hat{y}(\theta) \approx \mu(\theta) to all evaluated configurations.
    • Select θ=argmaxθΘ\theta^* = \underset{\theta \in \Theta}{\arg\max} EI(θ\theta; y^\hat{y}, s^\hat{s}), where s^\hat{s} is the predictive standard deviation of the surrogate model.
    • Evaluate μ(θ)\mu(\theta^*) via OOB error and update the history.
  4. Aggregate the best 5% of all evaluated hyperparameter sets (by performance), average their values (rounding mtry and min.node.size), and adopt this median as the final recommendation.
  5. Retrain "ranger" on the full dataset with the recommended hyperparameters, returning the fitted model.

This process is fully parallelizable for large datasets and exploits the efficiency of OOB estimation.

4. Implementation and API Usage

Installation is performed via:

1
install.packages("tuneRanger")

A typical workflow for classification tasks is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
library(mlr)
library(tuneRanger)
task = makeClassifTask(data=your.data, target="y")
estimateTimeTuneRanger(task, num.threads=4)
set.seed(123)
res = tuneRanger(task,
                 measure=list(multiclass.brier),
                 num.trees=500,
                 num.threads=4,
                 iters.warmup=30,
                 iters=70)
res%%%%13%%%%mtry, %%%%14%%%%sample.fraction
res$y                   # Best OOB performance
res$exec.time           # Runtime
ranger.model = res$model
predict(ranger.model, newdata=…)

Integration as an "mlr" learner allows direct use within "mlr" resampling and benchmarking:

1
lrn = makeLearner("classif.tuneRanger", num.trees=1000, predict.type="prob")

5. Benchmark Comparison and Empirical Results

A comprehensive benchmark evaluated tuneRanger and four alternative tuners on 39 binary classification datasets from the OpenML100 suite (datasets without missing values). All tests used 2,000 trees per forest, parallelized across 10 CPU cores. The main comparators included:

  • tuneRanger (multiple measures: MMCE, AUC, Brier, log-loss)
  • mlrHyperopt (SMBO via mlrMBO, tuning mtry & node size, 25 iters, 10-fold CV)
  • caret (grid on mtry, 3 points, 25 bootstraps)
  • tuneRF (greedy OOB, mtry only)
  • ranger default (no tuning)

The main findings were:

Method MMCE (avg) AUC (avg) Mean Train Time (s)
ranger (default) ~0.105 ~0.913 ~4
tuneRanger (MMCE/AUC) ~0.092/0.920 823–967
mlrHyperopt ~0.093 ~0.920 ~2,713
caret ~0.097 ~0.919 ~1,216
tuneRF ~0.094 ~0.917 ~863

tuneRanger ranked 1st or 2nd in MMCE and AUC, outperforming caret and tuneRF both in accuracy and consistency. mlrHyperopt produced similar accuracy but was substantially slower, attributable to using 10-fold cross-validation rather than OOB estimation. Brier score and log-loss trends mirrored MMCE and AUC.

A plausible implication is that OOB-based SMBO, as implemented in tuneRanger, offers an efficient trade-off between accuracy and computational cost.

6. Authors’ Recommendations and Best Practices

Key recommendations from Probst, Wright, and Boulesteix include:

  • Use ≥1,000 (preferably ≥2,000) trees to stabilize predictions and variable-importance metrics.
  • Focus tuning on mtry, sample.fraction, and min.node.size; other hyperparameters have limited impact.
  • Prefer OOB evaluation for tuning speed; use cross-validation only if potential OOB bias is suspected (e.g., very small n, high p, or balanced class difficulties).
  • Retain default tuning parameters (30 initial + 70 SMBO steps; Kriging + EI) for robust results, except under critical time constraints.
  • For the fastest, mtry-only tuning, tuneRF (greedy OOB) is ~1.5× faster, but cannot tune the additional parameters.
  • Visualize OOB prediction curves to ensure convergence and inspect SMBO logs to monitor search behavior.
  • When handling imbalanced data, use AUC, Brier, or log-loss as tuning criteria instead of error rate.
  • For variable-importance analyses, employ larger forests (≥5,000 trees) to achieve stable measures; adjust num.trees in tuneRanger accordingly.

7. Context, Limitations, and Extensions

tuneRanger operationalizes SMBO as a standardized, OOB-based hyperparameter optimization procedure for Random Forests in R, offering a practical balance of accuracy and speed. The major limitation is its reliance on the OOB error, which may be biased in specific settings (very small sample size, extremely high p, or balanced classes). The package can be extended to tune additional hyperparameters via the tune.parameters argument if application context warrants.

The significance of tuneRanger lies in its empirical demonstration of OOB-based SMBO's effectiveness and efficiency, its tight integration with "mlr"/"ranger," and its adoption of reproducible, best-practice protocols for hyperparameter optimization in Random Forests (Probst et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to tuneRanger Package.