Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention (2310.14770v2)

Published 23 Oct 2023 in cs.LG and stat.ML

Abstract: Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. Learning with labeling induced abstentions. In Advances in Neural Information Processing, pages 12576–12586, 2021.
  2. Calibration and consistency of adversarial surrogate losses. In Advances in Neural Information Processing Systems, 2021a.
  3. On the existence of the adversarial bayes classifier. In Advances in Neural Information Processing Systems, pages 2978–2990, 2021b.
  4. A finer calibration analysis for adversarial robustness. arXiv preprint arXiv:2105.01550, 2021c.
  5. ℋℋ{\mathscr{H}}script_H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, 2022a.
  6. Multi-class ℋℋ{\mathscr{H}}script_H-consistency bounds. In Advances in neural information processing systems, 2022b.
  7. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077–10094, 2023.
  8. DC-programming for neural network optimizations. Journal of Global Optimization, pages 1–17, 2024.
  9. Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11405–11414, 2021.
  10. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(8), 2008.
  11. Joseph Berkson. Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39:357––365, 1944.
  12. Joseph Berkson. Why I prefer logits to probits. Biometrics, 7(4):327––339, 1951.
  13. Kernel based rejection method for supervised classification. In WASET, 2007.
  14. Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses. In Advances in neural information processing systems, 2022.
  15. In defense of softmax parametrization for calibrated and consistent learning to defer. In Advances in Neural Information Processing Systems, 2023.
  16. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.
  17. Classification with rejection based on cost-sensitive classification. In International Conference on Machine Learning, pages 1507–1517, 2021.
  18. Learning to make adherence-aware advice. In International Conference on Learning Representations, 2024.
  19. Regression with cost-based rejection. In Advances in Neural Information Processing Systems, 2023.
  20. C Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on information theory, 16(1):41–46, 1970.
  21. C.K. Chow. An optimum character recognition system using decision function. IEEE T. C., 1957.
  22. Set-valued classification–overview via a unified framework. arXiv preprint arXiv:2102.12318, 2021.
  23. Learning with rejection. In International Conference on Algorithmic Learning Theory, pages 67–82, 2016a.
  24. Boosting with abstention. In Advances in Neural Information Processing Systems, pages 1660–1668, 2016b.
  25. Theory and algorithms for learning with rejection in binary classification. Annals of Mathematics and Artificial Intelligence, pages 1–39, 2023.
  26. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of machine learning research, 2(Dec):265–292, 2001.
  27. Consistency of plug-in confidence sets for classification in semi-supervised learning. Journal of Nonparametric Statistics, 32(1):42–72, 2020.
  28. Active learning algorithm through the lens of rejection arguments. arXiv preprint arXiv:2208.14682, 2022.
  29. A statistical decision rule with incomplete knowledge about classes. Pattern recognition, 26(1):155–165, 1993.
  30. Active learning via perfect selective classification. Journal of Machine Learning Research, 13(2), 2012.
  31. Ran El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
  32. Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, pages 973–978, 2001.
  33. Katja Filippova. Controlled hallucinations:learning to generate faithfully from noisy data. In Findings of EMNLP 2020, 2020.
  34. Support vector machines with embedded reject option. In ICPR, 2002.
  35. Multiple reject thresholds for improving classification reliability. In ICAPR, 2000.
  36. Selective classification via one-sided prediction. In International Conference on Artificial Intelligence and Statistics, pages 2179–2187, 2021.
  37. Selective classification for deep neural networks. In Advances in neural information processing systems, 2017.
  38. Selectivenet: A deep neural network with an integrated reject option. In International conference on machine learning, pages 2151–2159, 2019.
  39. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, 2017.
  40. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  41. Suppport vector machines with a reject option. In NIPS, 2008.
  42. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  43. Classification with reject option. Can. J. Stat., 2005.
  44. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Toronto University, 2009.
  45. Interaction between classification and reject performance for distance-based reject-option classifiers. PRL, 2005.
  46. An optimum class-rejective decision rule and its evaluation. In International Conference on Pattern Recognition, pages 3312–3315, 2010.
  47. Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004.
  48. Jing Lei. Classification with confidence. Biometrika, 101(4):755–769, 2014.
  49. Knows what it knows: a framework for self-aware learning. In International conference on Machine learning, pages 568–575, 2008.
  50. When no-rejection learning is optimal for regression with rejection. In International Conference on Artificial Intelligence and Statistics, 2024.
  51. Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
  52. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  53. Predict responsibly: improving fairness and accuracy by learning to defer. In Advances in Neural Information Processing Systems, 2018.
  54. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  55. Two-stage learning to defer with multiple experts. In Advances in Neural Information Processing Systems, 2023a.
  56. H-consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023b.
  57. Cross-entropy loss functions: Theoretical analysis and applications. In International conference on Machine learning, 2023c.
  58. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. arXiv preprint, 2023d.
  59. H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023e.
  60. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023f.
  61. Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2023g.
  62. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023h.
  63. Principled approaches for learning to defer with multiple experts. In International Symposium on Artificial Intelligence and Mathematics, 2024a.
  64. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In International Conference on Algorithmic Learning Theory, 2024b.
  65. ℋℋ{\mathscr{H}}script_H-consistency guarantees for regression. arXiv preprint arXiv:2403.19480, 2024c.
  66. Regression with multi-expert deferral. arXiv preprint arXiv:2403.19494, 2024d.
  67. Top-k𝑘kitalic_k classification and cardinality-aware prediction. arXiv preprint arXiv:2403.19625, 2024e.
  68. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online, July 2020. Association for Computational Linguistics. 10.18653/v1/2020.acl-main.173.
  69. Combining classifiers for improved classification of proteins from sequence or structure. BMCB, 2008.
  70. Learning to reject with a fixed predictor: Application to decontextualization. In International Conference on Learning Representations, 2024.
  71. Foundations of Machine Learning. MIT Press, second edition, 2018.
  72. Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087, 2020.
  73. Who should predict? exact algorithms for learning to defer to humans. In International Conference on Artificial Intelligence and Statistics, pages 10520–10545, 2023.
  74. Post-hoc estimators for learning to defer to an expert. In Advances in Neural Information Processing Systems, 2022.
  75. Learning to reject meets ood detection: Are all abstentions created equal? arXiv preprint arXiv:2301.12386, 2023.
  76. Yurii E Nesterov. A method for solving the convex programming problem with convergence rate o⁢(1/k2)𝑜1superscript𝑘2o(1/k^{2})italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Dokl. akad. nauk Sssr, 269:543–547, 1983.
  77. Reading digits in natural images with unsupervised feature learning. In Advances in Neural Information Processing Systems, 2011.
  78. On the calibration of multiclass classification with rejection. In Advances in Neural Information Processing Systems, pages 2582–2592, 2019.
  79. Differentiable learning under triage. Advances in Neural Information Processing Systems, 34:9140–9151, 2021.
  80. On optimal reject rules and ROC curves. PRL, 2005.
  81. Tadeusz Pietraszek. Optimizing abstaining classifiers using ROC. In ICML, 2005.
  82. Exponential savings in agnostic active learning through abstention. In Conference on Learning Theory, pages 3806–3832, 2021.
  83. The algorithmic automation problem: Prediction, triage, and human effort. arXiv preprint arXiv:1903.12220, 2019a.
  84. Direct uncertainty prediction for medical second opinions. In International Conference on Machine Learning, pages 5281–5290, 2019b.
  85. Consistent algorithms for multiclass classification with an abstain option. Electronic Journal of Statistics, 12(1):530–554, 2018.
  86. Composite binary losses. The Journal of Machine Learning Research, 11:2387–2422, 2010.
  87. Classification with abstention but without disparities. In Uncertainty in Artificial Intelligence, pages 1227–1236. PMLR, 2021.
  88. Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
  89. Growing a multi-class classifier with a reject option. Pattern Recognition Letters, 29(10):1565–1570, 2008.
  90. Francesco Tortorella. An optimal reject rule for binary classifiers. In ICAPR, 2001.
  91. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
  92. Pierre François Verhulst. Notice sur la loi que la population suit dans son accroissement. Correspondance mathématique et physique, 10:113––121, 1838.
  93. Pierre François Verhulst. Recherches mathématiques sur la loi d’accroissement de la population. Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles, 18:1––42, 1845.
  94. Calibrated learning to defer with one-vs-all classifiers. In International Conference on Machine Learning, pages 22184–22202, 2022.
  95. Learning to defer to multiple experts: Consistent surrogate losses, confidence calibration, and conformal ensembles. In International Conference on Artificial Intelligence and Statistics, pages 11415–11434, 2023.
  96. Multi-class support vector machines. Technical report, Citeseer, 1998.
  97. Agnostic selective classification. In Advances in neural information processing systems, 2011.
  98. Agnostic pointwise-competitive selective classification. Journal of Artificial Intelligence Research, 52:171–201, 2015.
  99. A compression technique for analyzing disagreement-based active learning. J. Mach. Learn. Res., 16:713–745, 2015.
  100. Learning to complement humans. In International Joint Conferences on Artificial Intelligence, pages 1526–1533, 2021.
  101. Classification methods with reject option based on convex risk minimization. Journal of Machine Learning Research, 11(1), 2010.
  102. SVMs with a reject option. In Bernoulli, 2011.
  103. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  104. The extended Littlestone’s dimension for learning with mistakes and abstentions. In Conference on Learning Theory, 2016a.
  105. The extended littlestone’s dimension for learning with mistakes and abstentions. In Conference on Learning Theory, pages 1584–1616, 2016b.
  106. Bayes consistency vs. H-consistency: The interplay between surrogate loss functions and the scoring function class. In Advances in Neural Information Processing Systems, 2020.
  107. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems, 2018.
  108. Revisiting discriminative vs. generative classifiers: Theory and implications. arXiv preprint arXiv:2302.02334, 2023.
  109. Efficient active learning with abstention. arXiv preprint arXiv:2204.00043, 2022.
  110. Deep gamblers: Learning to abstain with portfolio theory. arXiv preprint arXiv:1907.00208, 2019.
Citations (17)

Summary

We haven't generated a summary for this paper yet.