Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiclass Learnability Does Not Imply Sample Compression (2308.06424v2)

Published 12 Aug 2023 in cs.LG and stat.ML

Abstract: A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. A theory of pac learnability of partial concept classes. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 658–671. IEEE, 2022.
  2. Patrick Assouad. Densité et dimension. In Annales de l’Institut Fourier, volume 33, pages 233–282, 1983.
  3. Unambiguous dnfs and alon-saks-seymour. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 116–124. IEEE, 2022.
  4. A characterization of multiclass learnability. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022.
  5. Characterizations of learnability for classes of {{\{{0,…, n}}\}}-valued functions. Journal of Computer and System Sciences, 50(1):74–86, 1995.
  6. Low-sensitivity functions from unambiguous certificates. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), volume 67, page 28. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
  7. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
  8. Clique versus independent set. European Journal of Combinatorics, 40:73–92, 2014.
  9. On statistical learning via the lens of compression. arXiv preprint arXiv:1610.03592, 2016.
  10. Multiclass learnability and the erm principle. J. Mach. Learn. Res., 16(1):2377–2404, 2015.
  11. Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014.
  12. Yoav Freund. Boosting a weak learning algorithm by majority. Information and computation, 121(2):256–285, 1995.
  13. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  14. Sample compression, learnability, and the vapnik-chervonenkis dimension. Machine learning, 21:269–304, 1995.
  15. Mika Göös. Lower bounds for clique vs. independent set. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1066–1076. IEEE, 2015.
  16. A generalization of sauer’s lemma. Journal of Combinatorial Theory, Series A, 71(2):219–240, 1995.
  17. Jeff Kahn. Recent results on some not-so-recent hypergraph matching and covering problems. DIMACS, Center for Discrete Mathematics and Theoretical Computer Science, 1991.
  18. Relating data compression and learnability. 1986.
  19. Teaching and compressing for low vc-dimension. In A Journey Through Discrete Mathematics: A Tribute to Jiří Matoušek, pages 633–656. Springer, 2017.
  20. Sample compression schemes for vc classes. Journal of the ACM (JACM), 63(3):1–10, 2016.
  21. Balas K Natarajan. On learning sets and functions. Machine Learning, 4:67–97, 1989.
  22. Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
  23. Theory of pattern recognition, 1974.
  24. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015.
  25. J v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295–320, 1928.
  26. Manfred K Warmuth. Compressing to vc dimension many points. In COLT, volume 3, pages 743–744. Springer, 2003.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com