randomForestSRC Package Overview

Updated 23 December 2025

randomForestSRC is a unified R package that implements ensemble random forests for survival, regression, and classification analysis.
It adapts Breiman’s nonparametric tree ensembles for right-censored data with integrated error estimation and variable selection techniques.
The package provides practical tools such as OOB error metrics, variable importance (VIMP) measures, minimal depth analysis, and interactive visualizations.

The randomForestSRC package is a unified random forest implementation in R supporting survival, regression, and classification analysis based on ensemble decision tree methodology. Originating from Breiman’s nonparametric tree ensembles, with extensions for right-censored time-to-event data by Ishwaran and Kogalur, randomForestSRC provides a single functional and algorithmic framework for a variety of supervised learning scenarios, with extensive tools for error estimation, variable importance, interpretability, and missing data (Ehrlinger, 2016).

1. Core Functionality

The central function rfsrc() grows random forests for three response families:

"surv": Time-to-event outcomes, using the Surv(time,status) format.
"regr": Continuous regression.
"class": Categorical classification.

Default parameterization is set by the detected response type. Defaults include:

Split Rule: “logrank” for survival, “gini” for classification, “mse” (mean squared error) or “md” for regression.
mtry: $\lfloor\sqrt{p}\rfloor$ for survival/classification, $p/3$ for regression.
ntree: 1000.
nodesize: 1 (classification/regression), 3 (survival).
nsplit (survival): 10, i.e., random split selection per candidate covariate.
samptype: “swr” (with replacement; bagging), “swor” (without).
na.action: “na.impute” (adaptive tree imputation), or “na.omit.”

Users are encouraged to tune:

ntree until OOB error stabilizes (typically 500–2000).
mtry: Lower values promote tree heterogeneity (diversity), higher values yield lower individual tree bias.
nodesize: Smaller values permit deeper trees (greater variance, lower bias); for large data, moderate increases (5–10) may improve generalization.
splitrule in survival forests can be “logrank,” “logrankrandom,” or “random.”

2. Algorithmic Foundations

RandomForestSRC employs the classical forest approach:

Bootstrap & Aggregation: For $b=1, \dots, B$ $b = 1, \dots, B$ :
- Draw bootstrap sample $D_b$ (size $n$ ) from full data $D$ , fit tree $T_b$ .
- The OOB (out-of-bag) set, $OOB_b$ , comprises $\sim 36.8\%$ of data not used in $D_b$ .
Prediction:
- Regression:
$\hat{f}_{\mathrm{RF}}(x_i) = \frac{1}{B_i}\sum_{b: i\in OOB_b} \hat{f}_b(x_i)$ - Classification: Majority vote from OOB predictions. - Survival: Forest survival estimate is ensemble of terminal node Kaplan–Meier curves.
Splitting Criteria:
- Survival: Maximize two-sample log-rank statistic:
$LR(s) = \frac{\Big[\sum_{t\in\mathcal T}\big(d_L(t) - Y_L(t)\frac{d(t)}{Y(t)}\big)\Big]^2} {\sum_{t\in\mathcal T}\frac{Y_L(t)Y_R(t)d(t)[Y(t)-d(t)]}{Y(t)^2 [Y(t)-1]}}$

maximizing separation of event times post-split. - Regression: Split yielding largest impurity reduction $\Delta I = \sum(y_i-\bar y)^2$ . - Classification: Gini impurity or entropy reduction.
Survival Functions:
- Kaplan–Meier within terminal nodes:
$\hat S_{b,j}(t) = \prod_{u\le t}\left(1-\frac{d_{b,j}(u)}{Y_{b,j}(u)}\right)$

Forest ensemble:

$\hat S_{\mathrm{RF}}(t\mid x_i) = \frac{1}{B}\sum_{b=1}^B \hat S_{b,j_b}(t)$ - Cumulative hazard via Nelson–Aalen is analogous.

3. Error Estimation and Performance Metrics

Classification: OOB error as proportion of misclassified OOB samples; tracked via $err.rate</code>.</li> <li>Regression: OOB mean squared error:</li> </ul> $ \mathrm{OOB\_MSE} = \frac{1}{n}\sum_i(y_i-\hat{f}_{\mathrm{OOB}}(x_i))^2 $ <ul> <li>Survival: <ul> <li>Integrated <a href="https://www.emergentmind.com/topics/brier-score" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Brier Score</a> at time$ t $:</li> </ul> $ BS(t) = \frac{1}{n} \sum_{i=1}^n w_i(t)\left(I\{T_i>t\} - \hat S_{\mathrm{RF}}(t\mid x_i)\right)^2 $ with$ w_i(t) $as inverse-probability-of-censoring weights. - Concordance index (C-index): $ C = \frac{\sum_{i $ quantifying agreement between predicted and observed event orderings.</li> </ul> <h2 class='paper-heading' id='variable-importance-and-interpretability'>4. Variable Importance and Interpretability</h2> <ul> <li>Permutation Variable Importance (VIMP): <ul> <li>For variable$ within model objects and can be visualized.

Minimal Depth:

$\mathrm{depth}_b(v)$ : level of rootmost split on $v$ in tree $b$ (root=0).
Minimal depth = average over all trees.
Variables with small minimal depth generally have greatest predictive relevance.
var.select() computes minimal depths and an analytic mean-depth threshold to guide variable selection.

Interaction Assessment: find.interaction() returns a

p \times p

matrix indexed by variable pairs indicating candidate interactive effects, based on minimal depth.

5. Handling Right-Censored Data and Missingness

Censoring: Input via Surv(time, status) response, with status 1 for event, 0 for censoring.
Split handling: Log-rank statistic and node-specific Kaplan–Meier estimators enable robust nonparametric management of right-censoring.
Missing Data: Adaptive “na.impute” option imputes missing entries at each node, using draws from in-node non-missing data in split search and final OOB-aggregated filling.

6. Model Object Structure and Methodology

Objects returned by rfsrc() possess a standardized S3 structure:

Primary slots: $family`, `$ n, $p`, `$ mtry, $ntree`, `$ nodesize.
Forest representation: ` $forest` contains tree memberships ($ ndbigtree), node data, variables and split points.

Predictions and error rates: $predicted`, `$ predicted.oob,

 $err.rate</code>.</li> <li><strong>Survival-specific components</strong>: <code>$ survival` ( $n\times T$ matrix), ` $chf</code> (cumulative hazard), <code>$ time.interest $</code> (unique event times).</li> <li><strong>Variable selection and interaction</strong>: Integrated methods include <code>var.select()</code> and <code>find.interaction()</code> for in-depth model interrogation.</li> </ul> <p>Methods for visualization and exploration include <code>print.rfsrc()</code>, <code>plot.rfsrc()</code>, variable importance and depth plots, and heatmaps of interaction metrics.</p> <h2 class='paper-heading' id='representative-r-usage-and-workflow-examples'>7. Representative R Usage and Workflow Examples</h2> <p>Canonical workflow involves:</p> <p>$ err.rate

OOB error by tree All

 $importance</code></td> <td>Permutation VIMP</td> <td>All</td> </tr> <tr> <td>`$ survival`          | OOB survival estimates $n\times T$ 
Survival



` $chf` | OOB cumulative hazard$ n\times T $</td> <td>Survival</td> <td></td> </tr> <tr> <td><code>$ forest

Underlying trees: node data, splits, memberships All var.select() Computes minimal depth, analytic depth threshold All find.interaction() Pairwise minimal depth interactions All

The randomForestSRC package thus operationalizes nonparametric ensemble learning with rigorous error control and interpretability set within a unified R framework for categorical, continuous, and survival outcomes (Ehrlinger, 2016).

Markdown Report Issue Upgrade to Chat

References (1)

ggRandomForests: Exploring Random Forest Survival (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to randomForestSRC Package.