Prediction and Control in Continual Reinforcement Learning (2312.11669v1)
Abstract: Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
- Policy and value transfer in lifelong reinforcement learning. In International Conference on Machine Learning, pages 20–29. PMLR, 2018.
- A definition of continual reinforcement learning. arXiv preprint arXiv:2307.11046, 2023.
- Learning fast, learning slow: A general continual learning method based on complementary learning system. arXiv preprint arXiv:2201.12604, 2022.
- Lifelong reinforcement learning with modulating masks. arXiv preprint arXiv:2212.11110, 2022.
- Neuro-dynamic programming. Athena Scientific, 1996.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018.
- Reinforcement learning, fast and slow. Trends in cognitive sciences, 23(5):408–422, 2019.
- Task-agnostic continual reinforcement learning: In praise of a simple baseline. arXiv preprint arXiv:2205.14495, 2022.
- A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer vision, graphics, and image processing, 37(1):54–115, 1987.
- Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
- Minimalistic gridworld environment for openai gym, 2018.
- Continual backprop: Stochastic gradient descent with persistent randomness. arXiv preprint arXiv:2108.06325, 2021.
- Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Utility-based perturbed gradient descent: An optimizer for continual learning. arXiv preprint arXiv:2302.03281, 2023.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Bootstrapped meta-learning. arXiv preprint arXiv:2109.04504, 2021.
- A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):1–37, 2014.
- Neural distillation as a state representation bottleneck in reinforcement learning. In Conference on Lifelong Learning Agents, pages 798–818. PMLR, 2022.
- Meta-descent for online, continual prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33/0, pages 3943–3950, 2019.
- Meta-learning representations for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- Continual reinforcement learning with complex synapses. In International Conference on Machine Learning, pages 2497–2506. PMLR, 2018.
- Continual reinforcement learning with multi-timescale replay. arXiv preprint arXiv:2004.07530, 2020.
- Same state, different task: Continual reinforcement learning without interference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36/7, pages 7143–7151, 2022.
- Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75:1401–1476, 2022.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in cognitive sciences, 20(7):512–534, 2016.
- Adaptive online planning for continual lifelong learning. arXiv preprint arXiv:1912.01188, 2019.
- Understanding and preventing capacity loss in reinforcement learning. arXiv preprint arXiv:2204.09560, 2022.
- The primacy bias in deep reinforcement learning. In International Conference on Machine Learning, pages 16828–16847. PMLR, 2022.
- Fuzzy tiling activations: A simple approach to learning sparse representations online. arXiv preprint arXiv:1911.08068, 2019.
- Jelly bean world: A testbed for never-ending learning. arXiv preprint arXiv:2002.06306, 2020.
- Self-activating neural ensembles for continual reinforcement learning. In Conference on Lifelong Learning Agents, pages 683–704. PMLR, 2022.
- The persistence and transience of memory. Neuron, 94(6):1071–1084, 2017.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910, 2018.
- Mark B Ring. Child: A first step towards continual learning. Machine Learning, 28(1):77–104, 1997.
- Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
- Reinforcement learning of local shape in the game of go. In IJCAI, volume 7, pages 1053–1058, 2007.
- Sample-based learning and search with permanent and transient memories. In Proceedings of the 25th international conference on Machine learning, pages 968–975, 2008.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Analytical mean squared error curves for temporal difference learning. Machine Learning, 32(1):5–40, 1998.
- Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
- Richard S Sutton. Adapting bias by gradient descent: An incremental version of delta-bar-delta. In AAAI, pages 171–176. San Jose, CA, 1992.
- Reinforcement learning: An introduction. MIT press, 2018.
- On the role of tracking in stationary environments. In Proceedings of the 24th international conference on Machine learning, pages 871–878, 2007.
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 761–768, 2011.
- Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
- Distral: Robust multitask reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. 1989.
- Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments. arXiv preprint arXiv:1903.03176, 2019.