randomForestSRC is a unified R package that implements ensemble random forests for survival, regression, and classification analysis.
It adapts Breiman’s nonparametric tree ensembles for right-censored data with integrated error estimation and variable selection techniques.
The package provides practical tools such as OOB error metrics, variable importance (VIMP) measures, minimal depth analysis, and interactive visualizations.
The randomForestSRC package is a unified random forest implementation in R supporting survival, regression, and classification analysis based on ensemble decision tree methodology. Originating from Breiman’s nonparametric tree ensembles, with extensions for right-censored time-to-event data by Ishwaran and Kogalur, randomForestSRC provides a single functional and algorithmic framework for a variety of supervised learning scenarios, with extensive tools for error estimation, variable importance, interpretability, and missing data (Ehrlinger, 2016).
1. Core Functionality
The central function rfsrc() grows random forests for three response families:
"surv": Time-to-event outcomes, using the Surv(time,status) format.
"regr": Continuous regression.
"class": Categorical classification.
Default parameterization is set by the detected response type. Defaults include:
Split Rule: “logrank” for survival, “gini” for classification, “mse” (mean squared error) or “md” for regression.
mtry: ⌊p⌋ for survival/classification, p/3 for regression.
maximizing separation of event times post-split.
- Regression: Split yielding largest impurity reduction ΔI=∑(yi−yˉ)2.
- Classification: Gini impurity or entropy reduction.
Survival Functions:
Kaplan–Meier within terminal nodes:
S^b,j(t)=u≤t∏(1−Yb,j(u)db,j(u))
Forest ensemble:
S^RF(t∣xi)=B1b=1∑BS^b,jb(t)
- Cumulative hazard via Nelson–Aalen is analogous.
3. Error Estimation and Performance Metrics
Classification: OOB error as proportion of misclassified OOB samples; tracked via err.rate</code>.</p></li><li><p><strong>Regression</strong>:OOBmeansquarederror:</p></li></ul><p>\mathrm{OOB\_MSE} = \frac{1}{n}\sum_i(y_i-\hat{f}_{\mathrm{OOB}}(x_i))^2</p><ul><li><p><strong>Survival</strong>:</p><ul><li><strong>IntegratedBrierScore</strong>attimet:</li></ul><p>BS(t) = \frac{1}{n} \sum_{i=1}^n w_i(t)\left(I\{T_i>t\} - \hat S_{\mathrm{RF}}(t\mid x_i)\right)^2</p><p>withw_i(t)asinverse−probability−of−censoringweights.−<strong>Concordanceindex(C−index)</strong>:</p><p>C = \frac{\sum_{i</p><p>quantifyingagreementbetweenpredictedandobservedeventorderings.</p></li></ul><h2class=′paper−heading′id=′variable−importance−and−interpretability′>4.VariableImportanceandInterpretability</h2><ul><li><p><strong>PermutationVariableImportance(VIMP)</strong>:</p><ul><li>Forvariablev,permuteOOBvaluesineachtree,anticipateincreaseinOOBerror.</li><li>VIMP(v) = \text{Err}_{\text{permuted}}(v) - \text{Err}_{\text{original}}.</li><li>Valuesarestoredin<code>importance within model objects and can be visualized.
Minimal Depth:
depthb(v): level of rootmost split on v in tree b (root=0).
Minimal depth = average over all trees.
Variables with small minimal depth generally have greatest predictive relevance.
var.select() computes minimal depths and an analytic mean-depth threshold to guide variable selection.
Interaction Assessment: find.interaction() returns a p×p matrix indexed by variable pairs indicating candidate interactive effects, based on minimal depth.
5. Handling Right-Censored Data and Missingness
Censoring: Input via Surv(time, status) response, with status 1 for event, 0 for censoring.
Split handling: Log-rank statistic and node-specific Kaplan–Meier estimators enable robust nonparametric management of right-censoring.
Missing Data: Adaptive “na.impute” option imputes missing entries at each node, using draws from in-node non-missing data in split search and final OOB-aggregated filling.
6. Model Object Structure and Methodology
Objects returned by rfsrc() possess a standardized S3 structure:
The randomForestSRC package thus operationalizes nonparametric ensemble learning with rigorous error control and interpretability set within a unified R framework for categorical, continuous, and survival outcomes (Ehrlinger, 2016).