A Definition of Continual Reinforcement Learning (2307.11046v2)
Abstract: In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning. Despite the importance of continual reinforcement learning, the community lacks a simple definition of the problem that highlights its commitments and makes its primary concepts precise and clear. To this end, this paper is dedicated to carefully defining the continual reinforcement learning problem. We formalize the notion of agents that "never stop learning" through a new mathematical language for analyzing and cataloging agents. Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents. We provide two motivating examples, illustrating that traditional views of multi-task reinforcement learning and continual supervised learning are special cases of our definition. Collectively, these definitions and perspectives formalize many intuitive concepts at the heart of learning, and open new research pathways surrounding continual learning agents.
- Loss of plasticity in continual deep reinforcement learning. arXiv preprint arXiv:2303.07507, 2023.
- Policy and value transfer in lifelong reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2018.
- Safe policy search for lifelong reinforcement learning with sublinear regret. In Proceedings of the International Conference on Machine Learning, 2015.
- A domain-agnostic approach for characterization of lifelong learning systems. Neural Networks, 160:274–296, 2023.
- Peter L Bartlett. Learning with a slowly changing distribution. In Proceedings of the Annual Workshop on Computational Learning Theory, 1992.
- Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in Neural Information Processing Systems, 2014.
- Settling the reward hypothesis. In Proceedings of the International Conference on Machine Learning, 2023.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020.
- PAC-inspired option discovery in lifelong reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2014.
- Toward an architecture for never-ending language learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010.
- Acting optimally in partially observable stochastic domains. In Proceedings of the AAAI Conference on Artificiall Intelligence, 1994.
- A strongly asymptotically optimal agent in general environments. arXiv preprint arXiv:1903.01021, 2019.
- Online learning in Markov decision processes with changing cost sequences. In Procedings of the International Conference on Machine Learning, 2014.
- Continual backprop: Stochastic gradient descent with persistent randomness. arXiv preprint arXiv:2108.06325, 2021.
- Loss of plasticity in deep continual learning. arXiv preprint arXiv:2306.13812, 2023.
- Simple agent, complex environment: Efficient reinforcement learning with agent states. Journal of Machine Learning Research, 23(255):1–54, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, 2017.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
- Milton Friedman. Essays in positive economics. University of Chicago press, 1953.
- Model-based lifelong reinforcement learning with Bayesian exploration. Advances in Neural Information Processing Systems, 2022.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
- Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
- Marcus Hutter. A theory of universal artificial intelligence based on algorithmic complexity. arXiv preprint cs/0004001, 2000.
- Marcus Hutter. Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media, 2004.
- Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75:1401–1476, 2022.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Continual learning as computationally constrained reinforcement learning. arXiv preprint arXiv:2307.04345, 2023.
- Tor Lattimore. Theory of general reinforcement learning. PhD thesis, The Australian National University, 2014.
- Jan Leike. Nonparametric general reinforcement learning. PhD thesis, The Australian National University, 2016.
- Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges. Information fusion, 58:52–68, 2020.
- A definition of non-stationary bandits. arXiv preprint arXiv:2302.12202, 2023.
- Reinforcement learning, bit by bit. Foundations and Trends in Machine Learning, 16(6):733–865, 2023. ISSN 1935-8237.
- Meta-gradients in non-stationary environments. In Proceedings of the Conference on Lifelong Learning Agents, 2022.
- Understanding plasticity in neural networks. In Proceedings of the International Conference on Machine Learning, 2023.
- Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51, 2022.
- Sultan J Majeed. Abstractions of general reinforcement Learning. PhD thesis, The Australian National University, 2021.
- Performance guarantees for homomorphisms beyond Markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989.
- Never-ending learning. Communications of the ACM, 61(5):103–115, 2018.
- Online learning of non-stationary sequences. Advances in Neural Information Processing Systems, 16, 2003.
- Variational continual learning. 2018.
- Continual lifelong learning with neural networks: A review. Neural networks, 113:54–71, 2019.
- Jelly bean world: A testbed for never-ending learning. arXiv preprint arXiv:2002.06306, 2020.
- Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014.
- Continual learning in environments with polynomial mixing times. Advances in Neural Information Processing Systems, 2022.
- Mark B Ring. Continual learning in reinforcement environments. PhD thesis, The University of Texas at Austin, 1994.
- Mark B Ring. Child: A first step towards continual learning. Machine Learning, 28(1):77–104, 1997.
- Mark B Ring. Toward a formal framework for continual learning. In NeurIPS Workshop on Inductive Transfer, 2005.
- Experience replay for continual learning. Advances in Neural Information Processing Systems, 2019.
- Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2:575–609, 1994.
- ELLA: An efficient lifelong learning algorithm. In Proceedings of the International Conference on Machine Learning, 2013.
- Metalearning. Scholarpedia, 5(6):4650, 2010.
- Reinforcement learning with self-modifying policies. In Learning to Learn, pages 293–309. Springer, 1998.
- Progress & compress: A scalable framework for continual learning. In Proceedings of the International Conference on Machine Learning, 2018.
- Daniel L Silver. Machine lifelong learning: Challenges and benefits for artificial general intelligence. In Proceedings of the Conference on Artificial General Intelligence, 2011.
- Richard S Sutton. Introduction: The challenge of reinforcement learning. In Reinforcement Learning, pages 1–3. Springer, 1992.
- Richard S Sutton. The reward hypothesis, 2004. URL http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/rewardhypothesis.html.
- Richard S Sutton. The quest for a common model of the intelligent decision maker. arXiv preprint arXiv:2202.13252, 2022.
- Reinforcement Learning: An Introduction. MIT Press, 2018.
- Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633–1685, 2009.
- Sebastian Thrun. Is learning the n-th thing any easier than learning the first? Advances in Neural Information Processing Systems, 1995.
- Sebastian Thrun. Lifelong learning algorithms. Learning to Learn, 8:181–209, 1998.
- Lifelong robot learning. Robotics and autonomous systems, 15(1-2):25–46, 1995.
- Multi-task reinforcement learning: a hierarchical Bayesian approach. In Proceedings of the International Conference on Machine learning, 2007.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.