Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Top-$k$ Classification and Cardinality-Aware Prediction (2403.19625v1)

Published 28 Mar 2024 in cs.LG and stat.ML

Abstract: We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee consistency in relation to the hypothesis set $H$, providing stronger guarantees than Bayes-consistency due to their non-asymptotic and hypothesis-set specific nature. To address the trade-off between accuracy and cardinality $k$, we further introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning. For these functions, we derive cost-sensitive comp-sum and constrained surrogate losses, establishing their $H$-consistency bounds and Bayes-consistency. Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification. We report the results of extensive experiments on CIFAR-100, ImageNet, CIFAR-10, and SVHN datasets demonstrating the effectiveness and benefit of these algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. H𝐻Hitalic_H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pages 1117–1174, 2022a.
  2. Multi-class H𝐻Hitalic_H-consistency bounds. In Advances in neural information processing systems, pages 782–795, 2022b.
  3. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(8), 2008.
  4. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
  5. Joseph Berkson. Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39:357––365, 1944.
  6. Joseph Berkson. Why I prefer logits to probits. Biometrics, 7(4):327––339, 1951.
  7. Smooth loss functions for deep top-k classification. In International Conference on Learning Representations, 2018.
  8. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of machine learning research, 2(Dec):265–292, 2001.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  10. Confidence sets with expected sizes for multiclass classification. Journal of Machine Learning Research, 18(102):1–28, 2017.
  11. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, 2017.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  13. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  14. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Toronto University, 2009.
  15. Multi-class deep boosting. In Advances in Neural Information Processing Systems, pages 2501–2509, 2014.
  16. Top-k multiclass svm. In Advances in neural information processing systems, 2015.
  17. Loss functions for top-k error: Analysis and insights. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1468–1477, 2016.
  18. Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(07):1533–1554, 2018.
  19. Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004.
  20. Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
  21. Two-stage learning to defer with multiple experts. In Advances in neural information processing systems, 2023a.
  22. H-consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023b.
  23. H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023c.
  24. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023d.
  25. Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2023e.
  26. Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning, 2023f.
  27. Principled approaches for learning to defer with multiple experts. In International Symposium on Artificial Intelligence and Mathematics, 2024a.
  28. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In Algorithmic Learning Theory, 2024b.
  29. Theoretically grounded loss functions and algorithms for score-based multi-class abstention. In International Conference on Artificial Intelligence and Statistics, 2024c.
  30. Learning to reject with a fixed predictor: Application to decontextualization. In International Conference on Learning Representations, 2024.
  31. Foundations of machine learning. MIT press, 2018.
  32. A theory of multiclass boosting. Journal of Machine Learning Research, 2013.
  33. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
  34. Reading digits in natural images with unsupervised feature learning. In Advances in Neural Information Processing Systems, 2011.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  36. Stochastic negative mining for learning with large output spaces. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1940–1949, 2019.
  37. Multiclass boosting: Theory and algorithms. Advances in neural information processing systems, 24, 2011.
  38. Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
  39. Consistent polyhedral surrogates for top-k classification and variants. In International Conference on Machine Learning, pages 21329–21359, 2022.
  40. Ranking with ordered weighted pairwise classification. In International conference on machine learning, pages 1057–1064, 2009.
  41. Pierre François Verhulst. Notice sur la loi que la population suit dans son accroissement. Correspondance mathématique et physique, 10:113––121, 1838.
  42. Pierre François Verhulst. Recherches mathématiques sur la loi d’accroissement de la population. Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles, 18:1––42, 1845.
  43. Multi-class support vector machines. Technical report, Citeseer, 1998.
  44. On the consistency of top-k surrogate losses. In International Conference on Machine Learning, pages 10727–10735, 2020.
  45. Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004a.
  46. Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004b.
  47. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems, 2018.
  48. Revisiting discriminative vs. generative classifiers: Theory and implications. In International Conference on Machine Learning, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets