Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Trading off Consistency and Dimensionality of Convex Surrogates for the Mode (2402.10818v2)

Published 16 Feb 2024 in cs.LG and stat.ML

Abstract: In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$ in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large $n$, such as in information retrieval and structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding $n = 2{d}$ outcomes into the $d$-dimensional unit cube and $n = d!$ outcomes into the $d$-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. On consistent surrogate risk minimization and property elicitation. In Conference on Learning Theory, pages 4–22. PMLR, 2015.
  2. On the optimality of conditional expectation as a bregman predictor. IEEE Transactions on Information Theory, 51(7):2664–2669, 2005.
  3. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
  4. Mathieu Blondel. Structured prediction with projection oracles. Advances in neural information processing systems, 32, 2019.
  5. Learning with fenchel-young losses. The Journal of Machine Learning Research, 21(1):1314–1382, 2020.
  6. Arne Brondsted. An introduction to convex polytopes, volume 90. Springer Science & Business Media, 2012.
  7. An embedding framework for consistent polyhedral surrogates. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/9ec51f6eb240fb631a35864e13737bca-Paper.pdf.
  8. Unifying lower bounds on prediction dimension of consistent convex surrogates. arXiv preprint arXiv:2102.08218, 2021.
  9. Elicitation complexity of statistical properties. Biometrika, 108(4):857–879, 2021.
  10. Peter M Gruber. Convex and discrete geometry, volume 336. Springer, 2007.
  11. Fundamentals of convex analysis. Springer Science & Business Media, 2004.
  12. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  13. Convex calibration dimension for multiclass loss matrices. The Journal of Machine Learning Research, 17(1):397–441, 2016.
  14. R Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
  15. Cedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018.
  16. Large margin methods for structured and interdependent output variables. Journal of machine learning research, 6(9), 2005.
  17. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
  18. Solving marginal map problems with np oracles and parity constraints. Advances in Neural Information Processing Systems, 29, 2016.
  19. Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: