Sample size requirements for machine learning versus regression to achieve comparable stability
Determine whether tree-based machine learning methods, such as random forests, require substantially larger development sample sizes than penalised or unpenalised logistic regression to achieve comparable stability of individual-level risk estimates, and quantify the extent of any sample size differences.
References
Sample size for other machine learning approaches, such as tree-based methods, may need substantially higher sample sizes to achieve the same level of stability compared to (penalised) regression approaches. Further research is needed to substantiate this, but an initial investigation is provided in supplementary material S5 for our two examples.
                — A decomposition of Fisher's information to inform sample size for developing fair and precise clinical prediction models -- part 1: binary outcomes
                
                (2407.09293 - Riley et al., 12 Jul 2024) in Section 6 (Discussion); see also Supplementary Material S5