Dice Question Streamline Icon: https://streamlinehq.com

Mechanism for random forest outperformance as covariate correlation increases

Explain why increasing the constant pairwise correlation ρ among covariates in a multivariate normal design X ~ N(0, Σρ) (with off-diagonal entries equal to ρ) can cause random forests with split randomization (mtry < p) to outperform bagging (mtry = p) in out-of-sample mean-squared error in the MARS regression Y = 10 sin(π X1 X2) + 20 (X3 − 0.05)^2 + 10 X4 + 5 X5 + ε, and identify the mechanism responsible for this effect.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors analyze the impact of mutual covariate correlation (constant pairwise correlation ρ) on the performance of bagging versus random forests. They find that for both methods, correlation tends to reduce MSE mainly through bias reduction, and that the relative advantage of randomization grows with ρ. In some cases (e.g., ρ = 0.9 in the normal-covariate MARS setting), random forests even outperform bagging.

Despite documenting the empirical pattern, the authors state explicitly that they cannot explain why increasing ρ leads to forest outperformance in these settings. A mechanistic or theoretical account would clarify how correlation interacts with split randomization and tree construction to change bias and variance contributions.

References

Why increasing ρ can result in forests even outperforming bagging, as happens for ρ=0.9 in this example, is something we cannot explain.

When do Random Forests work? (2504.12860 - Revelas et al., 17 Apr 2025) in Section 4.3 (Correlated Covariates)