Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prediction Error Estimation in Random Forests (2309.00736v4)

Published 1 Sep 2023 in stat.ML and cs.LG

Abstract: In this paper, error estimates of classification Random Forests are quantitatively assessed. Based on the initial theoretical framework built by Bates et al. (2023), the true error rate and expected error rate are theoretically and empirically investigated in the context of a variety of error estimation methods common to Random Forests. We show that in the classification case, Random Forests' estimates of prediction error is closer on average to the true error rate instead of the average prediction error. This is opposite the findings of Bates et al. (2023) which are given for logistic regression. We further show that our result holds across different error estimation strategies such as cross-validation, bagging, and data splitting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Cross-validation: what does it estimate and how well does it do it? Journal of the American Statistical Association.
  2. Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
  3. Bylander, T. (2002). Estimating generalization error on two-class datasets using out-of-bag estimates. Machine Learning, 48(1-3):287–297. Copyright - Kluwer Academic Publishers 2002.
  4. Faraway, J. J. (2014). Does data splitting improve prediction? Statistics and Computing, 26(1–2):49–60.
  5. Random forests: some methodological insights.
  6. An application of random forests to a genome-wide association dataset: Methodological considerations & new findings. BMC genetics, 11:49.
  7. Random forests for genetic association studies. Statistical Applications in Genetics and Molecular Biology, 10(1).
  8. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
  9. An Introduction to Statistical Learning: with Applications in R. Springer.
  10. On the overestimation of random forest’s out-of-bag error. PLOS ONE, 13(8):1–31.
  11. Kaggle (2017). The state of data science & machine learning.
  12. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research, 17(26):1–41.
  13. Mitchell, M. (2011). Bias of the random forest out-of-bag (oob) error for certain input parameters. Open Journal of Statistics, 01:205–211.
  14. Confidence intervals for the generalisation error of random forests.
  15. Yousef, W. A. (2019). A leisurely look at versions and variants of the cross validation estimator.
  16. Out-of-bag estimation of the optimal hyperparameter in subbag ensemble method. Communications in Statistics - Simulation and Computation, 39(10):1877–1892.

Summary

We haven't generated a summary for this paper yet.