Existence of a moderate-SNR DGP with substantial random forest advantage over bagging
Determine whether there exists a regression data-generating process with moderate signal-to-noise ratio SNR = 1 (in the normalized framework where Var(f(X)) = 1) for which a random forest with split randomization (mtry < p) achieves a relative out-of-sample mean-squared error improvement substantially greater than 5% over bagging (mtry = p), and precisely characterize the properties of such a data-generating process if it exists.
References
When making this observation, a question we asked was the following: for moderate SNR, can we find a DGP for which forest outperforms bagging by much more than 5%? This would be useful to better understand the advantages of randomization. However, we did not succeed in finding such an example.
                — When do Random Forests work?
                
                (2504.12860 - Revelas et al., 17 Apr 2025) in Section 3 (Literature Replications: Review of Previous Findings), final paragraph