Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A replica analysis of under-bagging (2404.09779v3)

Published 15 Apr 2024 in stat.ML, cond-mat.dis-nn, cond-mat.stat-mech, and cs.LG

Abstract: Under-bagging (UB), which combines under-sampling and bagging, is a popular ensemble learning method for training classifiers on an imbalanced data. Using bagging to reduce the increased variance caused by the reduction in sample size due to under-sampling is a natural approach. However, it has recently been pointed out that in generalized linear models, naive bagging, which does not consider the class imbalance structure, and ridge regularization can produce the same results. Therefore, it is not obvious whether it is better to use UB, which requires an increased computational cost proportional to the number of under-sampled data sets, when training linear models. Given such a situation, in this study, we heuristically derive a sharp asymptotics of UB and use it to compare with several other popular methods for learning from imbalanced data, in the scenario where a linear classifier is trained from a two-component mixture data. The methods compared include the under-sampling (US) method, which trains a model using a single realization of the under-sampled data, and the simple weighting (SW) method, which trains a model with a weighted loss on the entire data. It is shown that the performance of UB is improved by increasing the size of the majority class while keeping the size of the minority fixed, even though the class imbalance can be large, especially when the size of the minority class is small. This is in contrast to US, whose performance is almost independent of the majority class size. In this sense, bagging and simple regularization differ as methods to reduce the variance increased by under-sampling. On the other hand, the performance of SW with the optimal weighting coefficients is almost equal to UB, indicating that the combination of reweighting and regularization may be similar to UB.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. On high-dimensional asymptotic properties of model averaging estimators, 2023.
  2. The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probability Theory and Related Fields, 174(3):1133–1185, August 2019. ISSN 1432-2064. doi: 10.1007/s00440-018-0879-0. URL https://doi.org/10.1007/s00440-018-0879-0.
  3. Optimal errors and phase transitions in high-dimensional generalized linear models. Proceedings of the National Academy of Sciences, 116(12):5451–5460, 2019.
  4. Asymptotics of resampling without replacement in robust and logistic regression, 2024.
  5. Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
  6. What is the effect of importance weighting in deep learning? In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  872–881. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/byrd19a.html.
  7. Spin Glass Theory and Far Beyond. WORLD SCIENTIFIC, 2023. doi: 10.1142/13341. URL https://www.worldscientific.com/doi/abs/10.1142/13341.
  8. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  9. Analysis of bootstrap and subsampling in high-dimensional regularized regression, 2024.
  10. Learning from imbalanced data in surveillance of nosocomial infection. Artificial Intelligence in Medicine, 37(1):7–18, 2006. ISSN 0933-3657. doi: https://doi.org/10.1016/j.artmed.2005.03.002. URL https://www.sciencedirect.com/science/article/pii/S0933365705000850. Intelligent Data Analysis in Medicine.
  11. Double trouble in double descent: Bias and variance(s) in the lazy regime. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  2280–2290. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/d-ascoli20a.html.
  12. High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46(1):247–279, 2018. ISSN 00905364, 21688966. URL https://www.jstor.org/stable/26542784.
  13. Subsample ridge ensembles: Equivalences and generalized cross-validation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  8585–8631. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/du23d.html.
  14. A. Engel and C. Van den Broeck. Statistical Mechanics of Learning. Cambridge University Press, 2001.
  15. A unifying tutorial on approximate message passing. Foundations and Trends® in Machine Learning, 15(4):335–536, 2022. ISSN 1935-8237. doi: 10.1561/2200000092. URL http://dx.doi.org/10.1561/2200000092.
  16. Generalisation error in learning with random features and the hidden manifold model. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  3452–3462. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/gerace20a.html.
  17. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263–1284, 2009. doi: 10.1109/TKDE.2008.239.
  18. Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory. IEEE Transactions on Information Theory, 60(10):6522–6554, 2014a. doi: 10.1109/TIT.2014.2343629.
  19. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15(82):2869–2909, 2014b. URL http://jmlr.org/papers/v15/javanmard14a.html.
  20. Survey on deep learning with class imbalance. Journal of Big Data, 6(1):27, March 2019. ISSN 2196-1115. doi: 10.1186/s40537-019-0192-5. URL https://doi.org/10.1186/s40537-019-0192-5.
  21. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573–3587, 2018. doi: 10.1109/TNNLS.2017.2732482.
  22. Logistic regression in rare events data. Political Analysis, 9(2):137–163, 2001. doi: 10.1093/oxfordjournals.pan.a004868.
  23. Label-imbalanced and group-sensitive classification under overparameterization. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  18970–18983. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/9dfcf16f0adbc5e2a55ef02db36bac7f-Paper.pdf.
  24. Statistical mechanics of ensemble learning. Phys. Rev. E, 55:811–825, Jan 1997. doi: 10.1103/PhysRevE.55.811. URL https://link.aps.org/doi/10.1103/PhysRevE.55.811.
  25. The implicit regularization of ordinary least squares ensembles. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp.  3525–3535. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/lejeune20b.html.
  26. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  27. Class imbalance problem in data mining review, 2013.
  28. Fluctuations, bias, variance &; ensemble of learners: Exact asymptotics for convex losses in high-dimension. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  14283–14314. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/loureiro22a.html.
  29. A variational approach to learning curves. In T. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems, volume 14. MIT Press, 2001. URL https://proceedings.neurips.cc/paper_files/paper/2001/file/26f5bd4aa64fdadf96152ca6e6408068-Paper.pdf.
  30. Statistical mechanics of learning: A variational approach for real data. Phys. Rev. Lett., 89:108302, Aug 2002. doi: 10.1103/PhysRevLett.89.108302. URL https://link.aps.org/doi/10.1103/PhysRevLett.89.108302.
  31. Learning curves and bootstrap estimates for inference with gaussian processes: A statistical mechanics study. Complexity, 8(4):57–63, 2003.
  32. The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics, 75(4):667–766, 2022. doi: https://doi.org/10.1002/cpa.22008. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.22008.
  33. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987.
  34. The role of regularization in classification of high-dimensional noisy gaussian mixture. In International Conference on Machine Learning, pp. 6874–6883. PMLR, 2020.
  35. Optim: A mathematical optimization package for Julia. Journal of Open Source Software, 3(24):615, 2018. doi: 10.21105/joss.00615.
  36. A friendly tutorial on mean-field spin glass techniques for non-physicists. Foundations and Trends® in Machine Learning, 17(1):1–173, 2024. ISSN 1935-8237. doi: 10.1561/2200000105. URL http://dx.doi.org/10.1561/2200000105.
  37. Why are bootstrapped deep ensembles not better? In ”I Can’t Believe It’s Not Better!” NeurIPS 2020 workshop, 2020. URL https://openreview.net/forum?id=dTCir0ceyv0.
  38. Semi-analytic resampling in lasso. Journal of Machine Learning Research, 20(70):1–33, 2019. URL http://jmlr.org/papers/v20/18-109.html.
  39. Manfred Opper. Learning to generalize. Frontiers of Life, 3(part 2):763–775, 2001.
  40. Statistical mechanics of generalization. In Models of Neural Networks III: Association, Generalization, and Representation, pp.  151–209. Springer, 1996.
  41. Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning, 2024.
  42. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  43. Learning with ensembles: How overfitting can be useful. In D. Touretzky, M.C. Mozer, and M. Hasselmo (eds.), Advances in Neural Information Processing Systems, volume 8. MIT Press, 1995. URL https://proceedings.neurips.cc/paper_files/paper/1995/file/1019c8091693ef5c5f55970346633f92-Paper.pdf.
  44. A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences, 116(29):14516–14525, 2019. doi: 10.1073/pnas.1810420116. URL https://www.pnas.org/doi/abs/10.1073/pnas.1810420116.
  45. Takashi Takahashi. Role of bootstrap averaging in generalized approximate message passing. In 2023 IEEE International Symposium on Information Theory (ISIT), pp.  767–772, 2023. doi: 10.1109/ISIT54713.2023.10206490.
  46. Takashi Takahashi. A replica analysis of self-training of linear classifier, 2024.
  47. Replicated vector approximate message passing for resampling problem, 2019.
  48. Semi-analytic approximate stability selection for correlated data in generalized linear models. Journal of Statistical Mechanics: Theory and Experiment, 2020(9):093402, sep 2020. doi: 10.1088/1742-5468/ababff. URL https://dx.doi.org/10.1088/1742-5468/ababff.
  49. Precise error analysis of regularized m𝑚mitalic_m -estimators in high dimensions. IEEE Transactions on Information Theory, 64(8):5592–5628, 2018. doi: 10.1109/TIT.2018.2840720.
  50. Class imbalance, redux. In 2011 IEEE 11th international conference on data mining, pp.  754–763. Ieee, 2011.
  51. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 05(04):597–604, 2006. doi: 10.1142/S0219622006002258. URL https://doi.org/10.1142/S0219622006002258.
  52. Minipatch learning as implicit ridge-like regularization. In 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), pp.  65–68, 2021. doi: 10.1109/BigComp51126.2021.00021.
  53. Identifying and compensating for feature deviation in imbalanced deep learning. arXiv preprint arXiv:2001.01385, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.