Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiclass Online Learning and Uniform Convergence (2303.17716v2)

Published 30 Mar 2023 in cs.LG and stat.ML

Abstract: We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Adversarial laws of large numbers and optimal regret in online classification. In Proceedings of the 53rdsuperscript53normal-rd53^{\mathrm{rd}}53 start_POSTSUPERSCRIPT roman_rd end_POSTSUPERSCRIPT Annual ACM SIGACT Symposium on Theory of Computing, 2021a.
  2. Adversarial laws of large numbers and optimal regret in online classification. arXiv:2101.09054, 2021b.
  3. Characterizations of learnability for classes of {0,…,n}0…𝑛\{0,\ldots,n\}{ 0 , … , italic_n }-valued functions. Journal of Computer and System Sciences, 50:74–86, 1995.
  4. Agnostic online learning. In Proceedings of the 22ndsuperscript22normal-nd22^{\mathrm{nd}}22 start_POSTSUPERSCRIPT roman_nd end_POSTSUPERSCRIPT Conference on Learning Theory, 2009.
  5. A characterization of multiclass learnability. In Proceedings of the 63rdsuperscript63normal-rd63^{\mathrm{rd}}63 start_POSTSUPERSCRIPT roman_rd end_POSTSUPERSCRIPT Annual IEEE Symposium on Foundations of Computer Science, 2022.
  6. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  7. A. Daniely and S. Shalev-Shwartz. Optimal learners for multiclass problems. In Proceedings of the 27thsuperscript27normal-th27^{{\rm th}}27 start_POSTSUPERSCRIPT roman_th end_POSTSUPERSCRIPT Conference on Learning Theory, 2014.
  8. Multiclass learnability and the ERM principle. In Proceedings of the 24thsuperscript24normal-th24^{\mathrm{th}}24 start_POSTSUPERSCRIPT roman_th end_POSTSUPERSCRIPT Conference on Learning Theory, 2011.
  9. Multiclass learnability and the ERM principle. Journal of Machine Learning Research, 16(12):2377–2404, 2015.
  10. On statistical learning via the lens of compression. In Advances in Neural Information Processing 29292929, 2016.
  11. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
  12. N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
  13. B. K. Natarajan. On learning sets and functions. Machine Learning, 4:67–97, 1989.
  14. Balas K. Natarajan. Some results on learning. Unpublished manuscript, 1988.
  15. Two new frameworks for learning. In ICML, pages 402–415, 1988.
  16. Online learning: Random averages, combinatorial parameters, and learnability. In Advances in Neural Information Processing Systems 23232323, 2010.
  17. Online learning via sequential complexities. Journal of Machine Learning Research, 16(1):155–186, 2015a.
  18. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 161:111–153, 2015b.
  19. M. Talagrand. Sharper bounds for gaussian and empirical processes. The Annals of Probability, 22:28–76, 1994.
  20. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
  21. V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974.
  22. M. Vidyasagar. Learning and Generalization with Applications to Neural Networks. Springer-Verlag, 2ndsuperscript2nd2^{{\rm nd}}2 start_POSTSUPERSCRIPT roman_nd end_POSTSUPERSCRIPT edition, 2003.
  23. V. Vovk. Aggregating strategies. In Proceedings of the 3rdsuperscript3normal-rd3^{{\rm rd}}3 start_POSTSUPERSCRIPT roman_rd end_POSTSUPERSCRIPT Annual Workshop on Computational Learning Theory, 1990.
  24. V. Vovk. Universal forecasting algorithms. Information and Computation, 96(2):245–277, 1992.
Citations (11)

Summary

We haven't generated a summary for this paper yet.