List Sample Compression and Uniform Convergence (2403.10889v1)
Abstract: List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning. Our primary goal is to determine whether classical principles in the PAC setting retain their applicability in the domain of list PAC learning. We focus on uniform convergence (which is the basis of Empirical Risk Minimization) and on sample compression (which is a powerful manifestation of Occam's Razor). In classical PAC learning, both uniform convergence and sample compression satisfy a form of `completeness': whenever a class is learnable, it can also be learned by a learning rule that adheres to these principles. We ask whether the same completeness holds true in the list learning setting. We show that uniform convergence remains equivalent to learnability in the list PAC learning setting. In contrast, our findings reveal surprising results regarding sample compression: we prove that when the label space is $Y={0,1,2}$, then there are 2-list-learnable classes that cannot be compressed. This refutes the list version of the sample compression conjecture by Littlestone and Warmuth (1986). We prove an even stronger impossibility result, showing that there are $2$-list-learnable classes that cannot be compressed even when the reconstructed function can work with lists of arbitrarily large size. We prove a similar result for (1-list) PAC learnable classes when the label space is unbounded. This generalizes a recent result by arXiv:2308.06424.
- A theory of pac learnability of partial concept classes. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 658–671, 2021. URL https://api.semanticscholar.org/CorpusID:236087943.
- Improved generalization bounds for adversarially robust learning. J. Mach. Learn. Res., 23:175:1–175:31, 2022. URL http://jmlr.org/papers/v23/20-1353.html.
- A characterization of multiclass learnability. page 13, 2022. ISBN 9781665455190. 10.1109/FOCS54457.2022.00093.
- Improper multiclass boosting. In Gergely Neu and Lorenzo Rosasco, editors, The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 5433–5452. PMLR, 2023. URL https://proceedings.mlr.press/v195/brukhim23a.html.
- A characterization of list learnability. Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2022. URL https://api.semanticscholar.org/CorpusID:253420830.
- Online learning and disambiguations of partial concept classes. In Kousha Etessami, Uriel Feige, and Gabriele Puppis, editors, 50th International Colloquium on Automata, Languages, and Programming, ICALP 2023, July 10-14, 2023, Paderborn, Germany, volume 261 of LIPIcs, pages 42:1–42:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. 10.4230/LIPICS.ICALP.2023.42. URL https://doi.org/10.4230/LIPIcs.ICALP.2023.42.
- Multiclass learnability and the ERM principle. Journal of Machine Learning Research, 16(12):2377–2404, 2015.
- Multiclass learnability and the erm principle. J. Mach. Learn. Res., 16:2377–2404, 2011. URL https://api.semanticscholar.org/CorpusID:12851942.
- Supervised learning through the lens of compression. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2784–2792, 2016. URL https://proceedings.neurips.cc/paper/2016/hash/59f51fd6937412b7e56ded1ea2470c25-Abstract.html.
- A borsuk-ulam lower bound for sign-rank and its applications. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 463–471. ACM, 2023. 10.1145/3564246.3585210. URL https://doi.org/10.1145/3564246.3585210.
- Loss functions for top-k error: Analysis and insights. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1468–1477, 2015. URL https://api.semanticscholar.org/CorpusID:9657031.
- N. Littlestone and M. Warmuth. Relating data compression and learnability. Unpublished manuscript, 1986.
- Philip M. Long. On agnostic learning with {0, *, 1}-valued and real-valued hypotheses. In David P. Helmbold and Robert C. Williamson, editors, Computational Learning Theory, 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 16-19, 2001, Proceedings, volume 2111 of Lecture Notes in Computer Science, pages 289–302. Springer, 2001. 10.1007/3-540-44581-1_19. URL https://doi.org/10.1007/3-540-44581-1_19.
- S. Moran and A. Yehudayoff. Sample compression schemes for VC classes. Journal of the ACM, 63(3):1–10, 2016.
- List online classification. In Gergely Neu and Lorenzo Rosasco, editors, The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 1885–1913. PMLR, 2023. URL https://proceedings.mlr.press/v195/moran23a.html.
- Chirag Pabbaraju. Multiclass learnability does not imply sample compression, 2023.
- Denis Pankratov. Direct sum questions in classical communication complexity. Master’s thesis, University of Chicago, 2012.
- Bibliography, page 244–249. Cambridge University Press, 2020.
- Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014. ISBN 1107057132.
- Learnability, stability and uniform convergence. Journal of Machine Learning Research, 11(90):2635–2670, 2010. URL http://jmlr.org/papers/v11/shalev-shwartz10a.html.
- A. Wigderson. Mathematics and Computation: A Theory Revolutionizing Technology and Science. Princeton University Press, 2019. ISBN 9780691189130. URL https://books.google.co.il/books?id=-WCqDwAAQBAJ.
- Top-k multi-class svm using multiple features. Information Sciences, 432:479–494, 2018. ISSN 0020-0255. https://doi.org/10.1016/j.ins.2017.08.004. URL https://www.sciencedirect.com/science/article/pii/S0020025517308642.