Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Universal Growth Rate for Learning with Smooth Surrogate Losses (2405.05968v2)

Published 9 May 2024 in cs.LG and stat.ML

Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our lower bound requires weaker conditions than those in previous work for excess error bounds, and our upper bound is entirely novel. Moreover, we extend this analysis to multi-class classification with a series of novel results, demonstrating a universal square-root growth rate for smooth comp-sum and constrained losses, covering common choices for training neural networks in multi-class classification. Given this universal rate, we turn to the question of choosing among different surrogate losses. We first examine how $H$-consistency bounds vary across surrogates based on the number of classes. Next, ignoring constants and focusing on behavior near zero, we identify minimizability gaps as the key differentiating factor in these bounds. Thus, we thoroughly analyze these gaps, to guide surrogate loss selection, covering: comparisons across different comp-sum losses, conditions where gaps become zero, and general conditions leading to small gaps. Additionally, we demonstrate the key role of minimizability gaps in comparing excess error bounds and $H$-consistency bounds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Shivani Agarwal. Surrogate regret bounds for bipartite ranking via strongly proper losses. The Journal of Machine Learning Research, 15(1):1653–1674, 2014.
  2. Calibration and consistency of adversarial surrogate losses. Advances in Neural Information Processing Systems, pages 9804–9815, 2021a.
  3. A finer calibration analysis for adversarial robustness. arXiv preprint arXiv:2105.01550, 2021b.
  4. Multi-class H𝐻Hitalic_H-consistency bounds. In Advances in neural information processing systems, pages 782–795, 2022a.
  5. H𝐻Hitalic_H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pages 1117–1174, 2022b.
  6. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077–10094, 2023.
  7. DC-programming for neural network optimizations. Journal of Global Optimization, pages 1–17, 2024.
  8. Han Bao. Proper losses, moduli of convexity, and surrogate regret bounds. In Conference on Learning Theory, pages 525–547, 2023.
  9. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(8), 2008.
  10. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
  11. Joseph Berkson. Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39:357––365, 1944.
  12. Joseph Berkson. Why I prefer logits to probits. Biometrics, 7(4):327––339, 1951.
  13. Mathieu Blondel. Structured prediction with projection oracles. In Advances in neural information processing systems, 2019.
  14. Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses. In Advances in neural information processing systems, 2022.
  15. Classification with rejection based on cost-sensitive classification. In International Conference on Machine Learning, pages 1507–1517, 2021.
  16. A consistent regularization approach for structured prediction. In Advances in neural information processing systems, 2016.
  17. A general framework for consistent structured prediction with implicit loss embeddings. The Journal of Machine Learning Research, 21(1):3852–3918, 2020.
  18. Learning with rejection. In International Conference on Algorithmic Learning Theory, pages 67–82, 2016a.
  19. Boosting with abstention. In Advances in Neural Information Processing Systems, pages 1660–1668, 2016b.
  20. Theory and algorithms for learning with rejection in binary classification. Annals of Mathematics and Artificial Intelligence, 2023.
  21. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of machine learning research, 2(Dec):265–292, 2001.
  22. Robinson’s implicit function theorem and its extensions. Math. Program., 117(1-2):129–147, 2009.
  23. Multiclass classification, information, divergence and surrogate risk. The Annals of Statistics, 46(6B):3246–3275, 2018.
  24. An embedding framework for consistent polyhedral surrogates. In Advances in neural information processing systems, 2019.
  25. Rafael Frongillo and Bo Waggoner. Surrogate regret bounds for polyhedral losses. In Advances in Neural Information Processing Systems, volume 34, pages 21569–21580, 2021.
  26. On the consistency of AUC pairwise optimization. In International Joint Conference on Artificial Intelligence, 2015.
  27. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages 1113–1120, 2011.
  28. A general theorem on selectors. Bull. Acad. Pol. Sci., Sér. Sci. Math. Astron. Phys., 13(8):397–403, 1965.
  29. Multi-class deep boosting. In Advances in Neural Information Processing Systems, pages 2501–2509, 2014.
  30. Loss functions for top-k error: Analysis and insights. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1468–1477, 2016.
  31. Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004.
  32. Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
  33. Binary excess risk for smooth convex surrogates. arXiv preprint arXiv:1402.1792, 2014.
  34. Two-stage learning to defer with multiple experts. In Advances in neural information processing systems, 2023a.
  35. H-consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023b.
  36. H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023c.
  37. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023d.
  38. Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2023e.
  39. Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning, 2023f.
  40. Principled approaches for learning to defer with multiple experts. In International Symposium on Artificial Intelligence and Mathematics, 2024a.
  41. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In International Conference on Algorithmic Learning Theory, 2024b.
  42. Theoretically grounded loss functions and algorithms for score-based multi-class abstention. In International Conference on Artificial Intelligence and Statistics, 2024c.
  43. H𝐻Hitalic_H-consistency guarantees for regression. arXiv preprint arXiv:2403.19480, 2024d.
  44. Regression with multi-expert deferral. arXiv preprint arXiv:2403.19494, 2024e.
  45. Top-k𝑘kitalic_k classification and cardinality-aware prediction. arXiv preprint arXiv:2403.19625, 2024f.
  46. Bayes-optimal scorers for bipartite ranking. In Conference on Learning Theory, pages 68–106, 2014.
  47. Learning to reject with a fixed predictor: Application to decontextualization. In International Conference on Learning Representations, 2024.
  48. Foundations of Machine Learning. MIT Press, second edition, 2018.
  49. Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087, 2020.
  50. Who should predict? exact algorithms for learning to defer to humans. In International Conference on Artificial Intelligence and Statistics, pages 10520–10545, 2023.
  51. On the calibration of multiclass classification with rejection. In Advances in Neural Information Processing Systems, pages 2582–2592, 2019.
  52. Sharp analysis of learning with discrete losses. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1920–1929, 2019.
  53. Consistent structured prediction with max-min margin markov networks. In International Conference on Machine Learning, pages 7381–7391, 2020.
  54. On the consistency of max-margin losses. In International Conference on Artificial Intelligence and Statistics, pages 4612–4633, 2022.
  55. A general theory for structured prediction with smooth convex surrogates. arXiv preprint arXiv:1902.01958, 2019.
  56. On structured prediction theory with calibrated convex surrogate losses. In Advances in Neural Information Processing Systems, 2017.
  57. Cost-sensitive multiclass classification risk bounds. In International Conference on Machine Learning, pages 1391–1399, 2013.
  58. Consistent algorithms for multiclass classification with an abstain option. Electronic Journal of Statistics, 12(1):530–554, 2018.
  59. Surrogate regret bounds for proper losses. In International Conference on Machine Learning, pages 897–904, 2009.
  60. Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
  61. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8(36):1007–1025, 2007.
  62. On theoretically optimal ranking functions in bipartite ranking. Journal of the American Statistical Association, 112(519):1311–1322, 2017.
  63. Pierre François Verhulst. Notice sur la loi que la population suit dans son accroissement. Correspondance mathématique et physique, 10:113––121, 1838.
  64. Pierre François Verhulst. Recherches mathématiques sur la loi d’accroissement de la population. Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles, 18:1––42, 1845.
  65. Calibrated learning to defer with one-vs-all classifiers. In International Conference on Machine Learning, pages 22184–22202, 2022.
  66. Learning to defer to multiple experts: Consistent surrogate losses, confidence calibration, and conformal ensembles. In International Conference on Artificial Intelligence and Statistics, pages 11415–11434, 2023.
  67. Support vector machines for multi-class pattern recognition. European Symposium on Artificial Neural Networks, 4(6), 1999.
  68. On the consistency of top-k surrogate losses. In International Conference on Machine Learning, pages 10727–10735, 2020.
  69. The lovász hinge: A novel convex surrogate for submodular losses. IEEE transactions on pattern analysis and machine intelligence, 42(3):735–748, 2018.
  70. Classification methods with reject option based on convex risk minimization. Journal of Machine Learning Research, 11(1), 2010.
  71. On the rates of convergence from surrogate risk minimizers to the Bayes optimal classifier. IEEE Transactions on Neural Networks and Learning Systems, 33(10):5766–5774, 2021.
  72. Bayes consistency vs. H-consistency: The interplay between surrogate loss functions and the scoring function class. In Advances in Neural Information Processing Systems, pages 16927–16936, 2020.
  73. Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004a.
  74. Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004b.
  75. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems, 2018.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com