Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Statistical curriculum learning: An elimination algorithm achieving an oracle risk (2402.13366v1)

Published 20 Feb 2024 in cs.LG and stat.ML

Abstract: We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The first two, referred to as strong/weak-oracle learners, receive high/low degrees of information about the models, and use these to learn. The third, a fully adaptive learner, estimates the target parameter vector without any prior information. In the single source case, we propose an elimination learning method, whose risk matches that of a strong-oracle learner. In the multiple source case, we advocate that the risk of the weak-oracle learner is a realistic benchmark for the risk of adaptive learners. We develop an adaptive multiple elimination-rounds CL algorithm, and characterize instance-dependent conditions for its risk to match that of the weak-oracle learner. We consider instance-dependent minimax lower bounds, and discuss the challenges associated with defining the class of instances for the bound. We derive two minimax lower bounds, and determine the conditions under which the performance weak-oracle learner is minimax optimal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
  2. Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
  3. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2021.
  4. Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
  5. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
  6. Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609, 2021.
  7. Michael Crawshaw. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796, 2020.
  8. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017.
  9. Gido M Van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
  10. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
  11. Jonathan Baxter. A model of inductive bias learning. Journal of artificial intelligence research, 12:149–198, 2000.
  12. Multi-task feature learning. Advances in neural information processing systems, 19, 2006.
  13. The benefit of multitask representation learning. Journal of Machine Learning Research, 17(81):1–32, 2016.
  14. Few-shot learning via learning the representation, provably. arXiv preprint arXiv:2002.09434, 2020.
  15. On the theory of transfer learning: The importance of task diversity. Advances in neural information processing systems, 33:7852–7862, 2020.
  16. Provable meta-learning of linear representations. In International Conference on Machine Learning, pages 10434–10443. PMLR, 2021.
  17. Continual learning in linear classification on separable data. arXiv preprint arXiv:2306.03534, 2023.
  18. Analysis of catastrophic forgetting for random orthogonal transformation tasks in the overparameterized regime. In International Conference on Artificial Intelligence and Statistics, pages 2975–2993. PMLR, 2023.
  19. Fixed design analysis of regularization-based continual learning. arXiv preprint arXiv:2303.10263, 2023.
  20. Theory on forgetting and generalization of continual learning. arXiv preprint arXiv:2302.05836, 2023.
  21. The joint effect of task similarity and overparameterization on catastrophic forgetting–an analytical model. arXiv preprint arXiv:2401.12617, 2024.
  22. Curriculum learning by transfer learning: Theory and experiments with deep networks. In International Conference on Machine Learning, pages 5238–5246. PMLR, 2018.
  23. On the power of curriculum learning in training deep networks. In International conference on machine learning, pages 2535–2544. PMLR, 2019.
  24. Theory of curriculum learning, with convex loss functions. Journal of Machine Learning Research, 21(222):1–19, 2020.
  25. An analytical theory of curriculum learning in teacher-student networks. Advances in Neural Information Processing Systems, 35:21113–21127, 2022.
  26. A mathematical model for curriculum learning. arXiv preprint arXiv:2301.13833, 2023.
  27. On the statistical benefits of curriculum learning. In International Conference on Machine Learning, pages 24663–24682. PMLR, 2022.
  28. Bandit algorithms. Cambridge University Press, 2020.
  29. High dimensional statistics. Lecture notes for course 18S997, 2019.
  30. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991–13008, 2015.
  31. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  32. Impossibility theorems for domain adaptation. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010a.
  33. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
  34. Martin J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  35. Minimax lower bounds for transfer learning with linear and one-hidden layer neural networks. Advances in Neural Information Processing Systems, 33:1959–1969, 2020.
  36. Few-shot learning via learning the representation, provably. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=pW2Q2xLwIMD.
  37. Numerical continuation methods: An introduction, volume 13. Springer Science & Business Media, 2012.
  38. Automatic curriculum learning for deep RL: A short survey. arXiv preprint arXiv:2003.04664, 2020.
  39. Curriculum learning for reinforcement learning domains: A framework and survey. The Journal of Machine Learning Research, 21(1):7382–7431, 2020.
  40. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  41. Ziping Xu. On the Benefits of Multitask Learning: A Perspective Based on Task Diversity. PhD thesis, 2023.
  42. A perspective view and survey of meta-learning. Artificial intelligence review, 18(2):77–95, 2002.
  43. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
  44. A theory of learning from different domains. Machine learning, 79:151–175, 2010b.
  45. New analysis and algorithm for learning with drifting distributions. In Algorithmic Learning Theory: 23rd International Conference, ALT 2012, Lyon, France, October 29-31, 2012. Proceedings 23, pages 124–138. Springer, 2012.
  46. A theory of transfer learning with applications to active learning. Machine learning, 90:161–189, 2013.
  47. A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers. In International conference on machine learning, pages 738–746. PMLR, 2013.
  48. On the value of target data in transfer learning. Advances in Neural Information Processing Systems, 32, 2019.
  49. A comprehensive survey on transfer learning, 2020.
  50. Marginal singularity and the benefits of labels in covariate-shift. The Annals of Statistics, 49(6):3299–3323, 2021.
  51. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
  52. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
  53. Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
  54. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
  55. Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
  56. Best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 27, 2014.
  57. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2014.
  58. Daniel Russo. Simple Bayesian algorithms for best arm identification. In Conference on Learning Theory, pages 1417–1418. PMLR, 2016.
  59. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, pages 998–1027. PMLR, 2016.
  60. Automated curriculum learning for neural networks. In international conference on machine learning, pages 1311–1320. Pmlr, 2017.
  61. Steve Hanneke. A statistical theory of active learning. Foundations and Trends in Machine Learning, pages 1–212, 2013.
  62. Steve Hanneke et al. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
  63. Hideitsu Hino. Active learning: Problem settings and recent developments. arXiv preprint arXiv:2012.04225, 2020.
  64. Mathematical statistics: A non-asymptotic approach. Lecture notes for IDS.160, 2020.
  65. Elements of Information Theory. Wiley-Interscience, Hoboken, NJ, USA, 2006.
  66. Probability and statistics, volume 563. Pearson Education London, UK:, 2014.
  67. Adaptive estimation of a quadratic functional by model selection. Annals of statistics, pages 1302–1338, 2000.
  68. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.

Summary

We haven't generated a summary for this paper yet.