Rethinking Teacher-Student Curriculum Learning through the Cooperative Mechanics of Experience (2404.03084v2)
Abstract: Teacher-Student Curriculum Learning (TSCL) is a curriculum learning framework that draws inspiration from human cultural transmission and learning. It involves a teacher algorithm shaping the learning process of a learner algorithm by exposing it to controlled experiences. Despite its success, understanding the conditions under which TSCL is effective remains challenging. In this paper, we propose a data-centric perspective to analyze the underlying mechanics of the teacher-student interactions in TSCL. We leverage cooperative game theory to describe how the composition of the set of experiences presented by the teacher to the learner, as well as their order, influences the performance of the curriculum that is found by TSCL approaches. To do so, we demonstrate that for every TSCL problem, an equivalent cooperative game exists, and several key components of the TSCL framework can be reinterpreted using game-theoretic principles. Through experiments covering supervised learning, reinforcement learning, and classical games, we estimate the cooperative values of experiences and use value-proportional curriculum mechanisms to construct curricula, even in cases where TSCL struggles. The framework and experimental setup we present in this work represents a novel foundation for a deeper exploration of TSCL, shedding light on its underlying mechanisms and providing insights into its broader applicability in machine learning.
- J Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society, 44(2):139–160, January 1982.
- Hindsight experience replay. In I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48–77, jan 2003. ISSN 0097-5397. doi: 10.1137/S0097539701398375. URL https://doi.org/10.1137/S0097539701398375.
- Values of Non-Atomic Games. Princeton University Press, 1974. URL http://www.jstor.org/stable/j.ctt13x149m.
- R Axelrod and W D Hamilton. The evolution of cooperation. Science, 211(4489):1390–1396, March 1981.
- Robert Axelrod. The emergence of cooperation among egoists. The American political science review, 75(2):306–318, 1981.
- Francis Bach. Learning with submodular functions: A convex optimization perspective. November 2011.
- Negotiating team formation using deep reinforcement learning. Artificial intelligence, 288(103356):103356, November 2020.
- Open-ended learning in symmetric zero-sum games. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 434–443. PMLR, 2019.
- Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 41–48, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605585161. URL https://doi.org/10.1145/1553374.1553380.
- Lilian Besson. SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python. Online at: github.com/SMPyBandits/SMPyBandits, 2018. URL https://github.com/SMPyBandits/SMPyBandits/. Code at https://github.com/SMPyBandits/SMPyBandits/, documentation at https://smpybandits.github.io/.
- Large-Scale multiclass support vector machine training via euclidean projection onto the simplex. In 2014 22nd International Conference on Pattern Recognition, pp. 1289–1294, August 2014.
- Learning with {amig}o: Adversarially motivated intrinsic goals. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=ETBc_MIMgoX.
- Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
- Prediction, Learning, and Games. Cambridge University Press, 2006. doi: 10.1017/CBO9780511546921.
- Decentralized reinforcement learning: Global Decision-Making via local economic transactions. In Hal Daumé Iii and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 1437–1447. PMLR, 2020.
- Minimalistic gridworld environment for gymnasium, 2018. URL https://github.com/Farama-Foundation/Minigrid.
- BabyAI: First steps towards grounded language learning with a human in the loop. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJeXCo0cYX.
- The limit points of (optimistic) gradient descent in min-max optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/139c3c1b7ca46a9d4fd6d163d98af635-Paper.pdf.
- A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, July 2022.
- On the complexity of cooperative solution concepts. 19:257–266, 1994. ISSN 0364-765X.
- It takes four to tango: Multiagent selfplay for automatic curriculum generation. International Conference on Learning Representations, 2022.
- Shaddin Dughmi. Submodular functions: Extensions, distributions, and algorithms. a survey. December 2009.
- On the computational complexity of weighted voting games. Annals of mathematics and artificial intelligence, 56(2):109–131, June 2009.
- J L Elman. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71–99, July 1993.
- Ulrich Faigle. Mathematical Game Theory. 2022.
- GANs may have no nash equilibria. arXiv e-prints, pp. arXiv:2002.09124, February 2020.
- Revisiting fundamentals of experience replay. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 3061–3071. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/fedus20a.html.
- A novel automated curriculum strategy to solve hard sokoban planning instances. October 2021.
- Efficiently identifying task groupings for Multi-Task learning. Advances in neural information processing systems, 2021.
- Merrill M. Flood. Some Experimental Games. RAND Corporation, Santa Monica, CA, 1952.
- Reverse curriculum generation for reinforcement learning. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (eds.), Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pp. 482–495. PMLR, 2017.
- Automatic goal generation for reinforcement learning agents. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1515–1528. PMLR, 2018.
- Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901. doi: 10.1080/14786440109462720. URL https://doi.org/10.1080/14786440109462720.
- Pick your battles: Interaction graphs as population-level objectives for strategic diversity. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21, pp. 1501–1503, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450383073.
- Eigengame: {PCA} as a nash equilibrium. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=NzTU59SYbNq.
- Data shapley: Equitable valuation of data for machine learning. April 2019.
- J C Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, 41(2):148–164, January 1979.
- Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
- Deep learning. MIT press, 2016.
- An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory, 28(4):547–565, Nov 1999. ISSN 0020-7276.
- Automated curriculum learning for neural networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1311–1320. PMLR, 2017.
- The value of a player in n-person games. Social choice and welfare, 18(3):465–483, 2001. ISSN 0176-1714.
- Evolution of extortion in iterated prisoner’s dilemma games. Proceedings of the National Academy of Sciences of the United States of America, 110(17):6913–6918, Apr 2013. ISSN 0027-8424.
- Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:498–520, 1933.
- Robust reinforcement learning as a stackelberg game via adaptively-regularized adversarial training. February 2022a.
- Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022b. URL http://jmlr.org/papers/v23/21-1342.html.
- Minmax optimization: Stable limit points of gradient descent ascent are locally optimal. arXiv.org, 2019.
- The robot brains podcast: Andrej karpathy on the visionary ai in tesla’s autonomous driving. https://podcasts.apple.com/us/podcast/andrej-karpathy-on-visionary-ai-in-teslas-autonomous/id1559275284?i=1000513993723, 2021.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Axelrod-Python/Axelrod: v4.12.0, October 2021.
- Submodularity and its applications in optimized information gathering. ACM Trans. Intell. Syst. Technol., 2(4):1–20, July 2011.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, April 2009.
- Learning multiple layers of features from tiny images. 2009.
- Flexible shaping: how learning in small steps helps. Cognition, 110(3):380–394, March 2009.
- A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Bandit Algorithms. Cambridge University Press, July 2020.
- MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
- Continual learning in the teacher-student setup: Impact of task similarity. International Conference on Machine Learning, 2021. URL https://www.semanticscholar.org/paper/57db7f24f15150ef7ea0db1fed20e6ee752792ec.
- Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3):293–321, May 1992.
- Pareto Multi-Task learning. Advances in neural information processing systems, 2019.
- The clear benchmark: Continual learning on real-world imagery. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Adam with bandit sampling for deep learning. October 2020.
- NeuPL: Neural population learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=MIX3fJkl_1.
- Core50: a new dataset and benchmark for continuous object recognition. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (eds.), Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pp. 17–26. PMLR, 13–15 Nov 2017. URL https://proceedings.mlr.press/v78/lomonaco17a.html.
- A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Teacher-Student curriculum learning. IEEE transactions on neural networks and learning systems, 31(9):3732–3740, September 2020.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Gordon H Bower (ed.), Psychology of Learning and Motivation, volume 24, pp. 109–165. Academic Press, January 1989.
- Sampling permutations for shapley value estimation. Journal of machine learning research: JMLR, 23(43):1–46, 2022.
- Games of GANs: game-theoretical models for generative adversarial networks. Artificial Intelligence Review, February 2023.
- A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning. Neural networks: the official journal of the International Neural Network Society, 160:306–336, March 2023.
- Curriculum learning for reinforcement learning domains: a framework and survey. Journal of machine learning research: JMLR, 21(1):7382–7431, June 2022.
- J F Nash. Equilibrium points in N-Person games. Proceedings of the National Academy of Sciences of the United States of America, 36(1):48–49, January 1950.
- Andrew Ng. Mlops: From model-centric to data-centric ai. https://www.youtube.com/watch?v=06-AZXmwHjo, 2021.
- The shapley value for n-person games in generalized characteristic function form. Games and economic behavior, 6(1):150–161, January 1994.
- Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.
- Continual lifelong learning with neural networks: A review. Neural networks: the official journal of the International Neural Network Society, 113:54–71, May 2019.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
- Game-theoretic vocabulary selection via the shapley value and banzhaf index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2789–2798, Online, June 2021. Association for Computational Linguistics.
- Teacher algorithms for curriculum learning of deep RL in continuously parameterized environments. October 2019a.
- Teacher algorithms for curriculum learning of deep RL in continuously parameterized environments. October 2019b.
- Meta automatic curriculum learning. November 2020.
- On the structure of synergies in cooperative games. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), Jun 2014. ISSN 2374-3468. doi: 10.1609/aaai.v28i1.8812. URL https://ojs.aaai.org/index.php/AAAI/article/view/8812.
- Automated curriculum generation through setter-solver interactions. In International Conference on Learning Representations, 2020.
- TeachMyAgent: a benchmark for automatic curriculum learning in deep RL. March 2021.
- Alvin E Roth (ed.). The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press, October 1988.
- On values for generalized characteristic functions. Operations-Research-Spektrum, 19(3):229–234, September 1997.
- Prioritized experience replay. In 4th International Conference on Learning Representations, 2016.
- J. Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. The MIT Press, 1991. ISBN 9780262256674.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Burr Settles. Active learning literature survey. Machine learning, 15(2):201–221, 2010.
- Auxiliary learning as an asymmetric bargaining game. ArXiv, 2023.
- Lloyd S Shapley. A value for N-Person games. Technical report, RAND Corporation, 1952.
- Lloyd S Shapley. Cores of convex games. International Journal of Game Theory, 1(1):11–26, December 1971.
- Martin Shubik. Game theory models and methods in political economy. In Kenneth J Arrow and Michael D Intriligator (eds.), Handbook of Mathematical Economics, volume 1, pp. 285–330. Elsevier, 1981.
- Curriculum learning: A survey. International journal of computer vision, 130(6):1526–1565, June 2022.
- Climb: A continual learning benchmark for vision-and-language tasks. volume 35, pp. 29440–29453, 2022.
- Which tasks should be learned together in multi-task learning? May 2019.
- Reinforcement Learning: An Introduction. The MIT Press, 2nd edition, 2018.
- Safe reinforcement learning via curriculum induction. Advances in neural information processing systems, 33:12151–12162, 2020.
- René van den Brink and Gerard van der Laan. Axiomatizations of the normalized banzhaf value and the shapley value. Social Choice and Welfare, 15(4):567–582, 1998. ISSN 01761714, 1432217X. URL http://www.jstor.org/stable/41106281.
- Theory of games and economic behavior. Princeton University Press, Princeton, NJ, 1944.
- A survey on curriculum learning. IEEE transactions on pattern analysis and machine intelligence, 44(9):4555–4576, September 2022.
- Robert J Weber. Probabilistic values for games. In A E Roth (ed.), Essays on the Shapley Value and Its Applications, pp. 101–119. Cambridge University Press, 1988.
- Curriculum learning by transfer learning: Theory and experiments with deep networks. Technical report, 2018.
- Michael P. Wellman. Methods for empirical game-theoretic analysis. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2, AAAI’06, pp. 1552–1555. AAAI Press, 2006. ISBN 9781577352815.
- When do curricula work? International Conference on Learning Representations, 2020.
- Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
- If you like shapley then you’ll love the core. Proceedings of the … AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 35(6):5751–5759, May 2021.
- Disentangling transfer and interference in Multi-Domain learning. ArXiv, 2021.