Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sharp error bounds for imbalanced classification: how many examples in the minority class? (2310.14826v2)

Published 23 Oct 2023 in stat.ML and cs.LG

Abstract: When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. A reductions approach to fair classification. In International Conference on Machine Learning, pages 60–69. PMLR.
  2. Tail inverse regression for dimension reduction with extreme response.
  3. Local Rademacher complexities. The Annals of Statistics, 33(4):1497 – 1537.
  4. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156.
  5. Empirical minimization. Probability theory and related fields, 135(3):311–334.
  6. Learning from biased data: A semi-parametric approach. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 803–812. PMLR.
  7. Lectures on the nearest neighbor method, volume 246. Springer.
  8. Theory of classification: A survey of some recent advances. ESAIM: probability and statistics, 9:323–375.
  9. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
  10. Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations of computational Mathematics, 2(4):413–428.
  11. Elkan, C. (2001a). The foundations of cost-sensitive learning. Proceedings of the Seventeenth International Conference on Artificial Intelligence: 4-10 August 2001; Seattle, 1.
  12. Elkan, C. (2001b). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17-1, pages 973–978. Lawrence Erlbaum Associates Ltd.
  13. Cost sensitive ν𝜈\nuitalic_ν-support vector machine with linex loss. Information Processing & Management, 59(2):102809.
  14. On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals. Annales de l’IHP Probabilités et statistiques, 37(4):503–522.
  15. Concentration inequalities and asymptotic results for ratio type empirical processes. The Annals of Probability, 34(3):1143 – 1216.
  16. Learning the dependence structure of rare events: a non-asymptotic study. In Conference on Learning Theory, pages 843–860. PMLR.
  17. A guided tour of chernoff bounds. Information processing letters, 33(6):305–308.
  18. Haussler, D. (1995). Sphere packing numbers for subsets of the boolean n-cube with bounded vapnik-chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217–232.
  19. Risk consistency of cross-validation with lasso-type procedures. Statistica Sinica, pages 1017–1036.
  20. On binary classification in extreme regions. In NeurIPS, pages 3096–3104.
  21. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573–3587.
  22. Stability and deviation optimal risk bounds with convergence rate o⁢(1/n)𝑜1𝑛o(1/n)italic_o ( 1 / italic_n ). Advances in Neural Information Processing Systems, 34:5065–5076.
  23. Fast rates for exp-concave empirical risk minimization. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
  24. Consistent binary classification with generalized performance metrics. Advances in neural information processing systems, 27.
  25. Efficient l~ 1 regularized logistic regression. In Aaai, volume 6, pages 401–408.
  26. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550.
  27. Lugosi, G. (2002). Pattern classification and learning theory. In Principles of nonparametric learning, pages 1–56. Springer.
  28. Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655.
  29. Mendelson, S. (2002). Improving the sample complexity using global data. IEEE transactions on Information Theory, 48(7):1977–1991.
  30. On the statistical consistency of algorithms for binary classification under class imbalance. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28-3 of Proceedings of Machine Learning Research, pages 603–611, Atlanta, Georgia, USA. PMLR.
  31. The cost of fairness in binary classification. In Conference on Fairness, accountability and transparency, pages 107–118. PMLR.
  32. Deep transfer learning based classification model for covid-19 disease. IRBM, 43(2):87–92.
  33. Risk bounds when learning infinitely many response functions by ordinary linear regression. In Annales de l’Institut Henri Poincare (B) Probabilites et statistiques, volume 59-1, pages 53–78. Institut Henri Poincaré.
  34. Portier, F. (2021). Nearest neighbor process: weak convergence and non-asymptotic bound.
  35. Resnick, S. I. (2013). Extreme values, regular variation and point processes. Springer.
  36. A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 25:235–272.
  37. Scott, C. (2012). Calibrated asymmetric surrogate losses. Electronic Journal of Statistics, 6(none):958 – 992.
  38. When ot meets mom: Robust estimation of wasserstein distance. In Banerjee, A. and Fukumizu, K., editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 136–144. PMLR.
  39. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12):3358–3378.
  40. Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7(1).
  41. Evolutionary undersampling for imbalanced big data classification. In 2015 IEEE Congress on Evolutionary Computation (CEC), pages 715–722.
  42. Fast rates in statistical and online learning. Journal of Machine Learning Research, 16(54):1793–1861.
  43. Van Handel, R. (2014). Probability in high dimension. Technical report, PRINCETON UNIV NJ.
  44. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264–280.
  45. Cost-sensitive learning and decision making revisited. European Journal of Operational Research, 166(1):212–220. Metaheuristics and Worst-Case Guarantee Algorithms: Relations, Provable Properties and Applications.
  46. Weighted empirical risk minimization: Transfer learning based on importance sampling. In 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 515–520. i6doc. com.
  47. Training deep neural networks on imbalanced data sets. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 4368–4374.
  48. Multiclass probability estimation with support vector machines. Journal of Computational and Graphical Statistics, 28(3):586–595.
  49. Some special vapnik-chervonenkis classes. Discrete Mathematics, 33(3):313–318.
  50. Smartsla: Cost-sensitive management of virtualized resources for cpu-bound database services. IEEE Transactions on Parallel and Distributed Systems, 26(5):1441–1451.
  51. Outlier-robust pca: The high-dimensional case. IEEE transactions on information theory, 59(1):546–572.
  52. Class-weighted classification: Trade-offs and robust approaches. In International Conference on Machine Learning, pages 10544–10554. PMLR.
  53. Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com