Trading off Consistency and Dimensionality of Convex Surrogates for the Mode (2402.10818v2)
Abstract: In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$ in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large $n$, such as in information retrieval and structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding $n = 2{d}$ outcomes into the $d$-dimensional unit cube and $n = d!$ outcomes into the $d$-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.
- On consistent surrogate risk minimization and property elicitation. In Conference on Learning Theory, pages 4–22. PMLR, 2015.
- On the optimality of conditional expectation as a bregman predictor. IEEE Transactions on Information Theory, 51(7):2664–2669, 2005.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
- Mathieu Blondel. Structured prediction with projection oracles. Advances in neural information processing systems, 32, 2019.
- Learning with fenchel-young losses. The Journal of Machine Learning Research, 21(1):1314–1382, 2020.
- Arne Brondsted. An introduction to convex polytopes, volume 90. Springer Science & Business Media, 2012.
- An embedding framework for consistent polyhedral surrogates. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/9ec51f6eb240fb631a35864e13737bca-Paper.pdf.
- Unifying lower bounds on prediction dimension of consistent convex surrogates. arXiv preprint arXiv:2102.08218, 2021.
- Elicitation complexity of statistical properties. Biometrika, 108(4):857–879, 2021.
- Peter M Gruber. Convex and discrete geometry, volume 336. Springer, 2007.
- Fundamentals of convex analysis. Springer Science & Business Media, 2004.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
- Convex calibration dimension for multiclass loss matrices. The Journal of Machine Learning Research, 17(1):397–441, 2016.
- R Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
- Cedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018.
- Large margin methods for structured and interdependent output variables. Journal of machine learning research, 6(9), 2005.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Solving marginal map problems with np oracles and parity constraints. Advances in Neural Information Processing Systems, 29, 2016.
- Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.