Exploring the Promise and Limits of Real-Time Recurrent Learning (2305.19044v3)
Abstract: Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. Importantly, we also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.
- Deepmind lab. Preprint arXiv:1612.03801, 2016.
- Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets. Preprint arXiv:1901.09049, 2019.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Optimal kronecker-sum approximation of real time recurrent learning. In Proc. Int. Conf. on Machine Learning (ICML), volume 97, pp. 604–613, Long Beach, CA, USA, June 2019.
- Quasi-recurrent neural networks. In Int. Conf. on Learning Representations (ICLR), Toulon, France, April 2017.
- Leveraging procedural generation to benchmark reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pp. 2048–2056, Virtual only, July 2020.
- Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
- IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. In Proc. Int. Conf. on Machine Learning (ICML), pp. 1406–1415, Stockholm, Sweden, July 2018.
- Addressing some limitations of Transformers with feedback memory. Preprint arXiv:2002.09402, 2020.
- Michael Gherrity. A learning algorithm for analog, fully recurrent neural networks. In Proc. Int. Joint Conf. on Neural Networks (IJCNN), volume 643, 1989.
- IndyLSTMs: Independently recurrent LSTMS. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 3352–3356, Barcelona, Spain, May 2020.
- BPS: A learning algorithm for capturing the dynamic nature of speech. In Proc. Int. Joint Conf. on Neural Networks (IJCNN), pp. 417–423, Washington, DC, USA, 1989.
- LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10):2222–2232, 2016.
- Diagonal state spaces are as effective as structured state spaces. In Proc. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, November 2022.
- Deep recurrent Q-learning for partially observable MDPs. In AAAI Fall Symposia, pp. 29–37, Arlington, VA, USA, November 2015.
- Multi-task deep reinforcement learning with popart. In Proc. AAAI Conf. on Artificial Intelligence, pp. 3796–3803, Honolulu, HI, USA, January 2019.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Going beyond linear transformers with recurrent fast weight programmers. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Virtual only, December 2021.
- Practical computational power of linear transformers and their recurrent and self-referential extensions. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Sentosa, Singapore, 2023.
- Scalable online recurrent learning using columnar neural networks. Preprint arXiv:2103.05787, 2021.
- Scalable real-time recurrent learning using columnar-constructive networks. J. Mach. Learn. Res., vol.24, 2023.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Recurrent experience replay in distributed reinforcement learning. In Int. Conf. on Learning Representations (ICLR), New Orleans, LA, USA, May 2019.
- Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proc. Int. Conf. on Machine Learning (ICML), Virtual only, July 2020.
- Actor-critic algorithms. In Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1008–1014, Denver, CO, USA, November 1999.
- Dynamic evaluation of neural sequence models. In Proc. Int. Conf. on Machine Learning (ICML), pp. 2771–2780, Stockholm, Sweden, July 2018.
- Connected recognition with a recurrent network. Speech Communication, 9(1):41–48, 1990.
- Torchbeast: A PyTorch platform for distributed RL. Preprint arXiv:1910.03552, 2019.
- Simple recurrent units for highly parallelizable recurrence. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 4470–4481, Brussels, Belgium, November 2018.
- Real-time recurrent reinforcement learning. Preprint arXiv:2311.04830, 2023.
- Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5457–5466, Salt Lake City, UT, USA, June 2018.
- A multilayer real-time recurrent learning algorithm for improved convergence. In Proc. Int. Conf. on Artificial Neural Networks (ICANN), pp. 445–450, Lausanne, Switzerland, October 1997.
- Practical real time recurrent learning with a sparse approximation. In Int. Conf. on Learning Representations (ICLR), Virtual only, May 2021.
- A formal hierarchy of RNN architectures. In Proc. Association for Computational Linguistics (ACL), pp. 443–459, Virtual only, July 2020.
- Recurrent neural network based language model. In Proc. Interspeech, pp. 1045–1048, Makuhari, Japan, September 2010.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Asynchronous methods for deep reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pp. 1928–1937, New York City, NY, USA, June 2016.
- Towards interpretable reinforcement learning using attention augmented agents. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 12329–12338, Vancouver, Canada, December 2019.
- Michael C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Complex Systems, 3(4):349–381, 1989.
- Michael C. Mozer. Induction of multiscale temporal structure. In Proc. Advances in Neural Information Processing Systems (NIPS), pp. 275–282, Denver, CO, USA, December 1991.
- Approximating real-time recurrent learning with random kronecker factors. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 6594–6603, Montréal, Canada, December 2018.
- Resurrecting recurrent neural networks for long sequences. In Proc. Int. Conf. on Machine Learning (ICML), Honolulu, Hawaii, USA, July 2023.
- History compression via language models in reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), Baltimore, MD, USA, July 2022.
- Stabilizing Transformers for reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pp. 7487–7498, Virtual only, July 2020.
- Barak A Pearlmutter. Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2):263–269, 1989.
- Anthony J Robinson. Dynamic error propagation networks. PhD thesis, University of Cambridge, 1989.
- The utility driven dynamic error propagation network, volume 1. University of Cambridge Department of Engineering Cambridge, 1987.
- Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- Linear Transformers are secretly fast weight programmers. In Proc. Int. Conf. on Machine Learning (ICML), Virtual only, July 2021.
- Jürgen Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In Proc. Int. Joint Conf. on Neural Networks (IJCNN), pp. 253–258, San Diego, CA, USA, June 1990.
- Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Technical Report FKI-147-91, Institut für Informatik, Technische Universität München, March 1991.
- Jürgen Schmidhuber. A fixed size storage o(n3)𝑜superscript𝑛3o(n^{3})italic_o ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2):243–248, 1992.
- Learning by directional gradient descent. In Int. Conf. on Learning Representations (ICLR), Virtual only, April 2022.
- Reinforcement learning: An introduction. MIT press, 1998.
- Policy gradient methods for reinforcement learning with function approximation. In Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1057–1063, Denver, CO, USA, November 1999.
- Unbiased online recurrent optimization. In Int. Conf. on Learning Representations (ICLR), Vancouver, Canada, April 2018.
- Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
- Recurrent policy gradients. Logic Journal of IGPL, 18(2):620–634, 2010.
- Ronald J Williams. Complexity of exact gradient computation algorithms for recurrent neural networks. Technical report, NU-CCS-89-27, Northeastern University, College of Computer Science, 1989.
- An efficient gradient-based algorithm for online training of recurrent network trajectories. Neural computation, 2(4):490–501, 1990.
- Experimental analysis of the real-time recurrent learning algorithm. Connection science, 1(1):87–111, 1989a.
- A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989b.
- Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, architectures, and applications, 433:17, 1995.
- Online learning of long-range dependencies. In Proc. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, December 2023.
- Kazuki Irie (35 papers)
- Anand Gopalakrishnan (9 papers)
- Jürgen Schmidhuber (124 papers)