A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks (2410.22391v3)
Abstract: In recent years, there has been a trend in the field of Reinforcement Learning (RL) towards large action models trained offline on large-scale datasets via sequence modeling. Existing models are primarily based on the Transformer architecture, which result in powerful agents. However, due to slow inference times, Transformer-based approaches are impractical for real-time applications, such as robotics. Recently, modern recurrent architectures, such as xLSTM and Mamba, have been proposed that exhibit parallelization benefits during training similar to the Transformer architecture while offering fast inference. In this work, we study the aptitude of these modern recurrent architectures for large action models. Consequently, we propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities. Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320.
- Vision-lstm: xlstm as generic vision backbone. CoRR, abs/2406.04303.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33:12449–12460.
- xlstm: Extended long short-term memory. CoRR, abs/2405.04517.
- Autonomous navigation of stratospheric balloons using reinforcement learning. Nature, 588(7836):77–82.
- The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818.
- RT-1: robotics transformer for real-world control at scale. In Bekris, K. E., Hauser, K., Herbert, S. L., and Yu, J., editors, Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023.
- Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097.
- The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE.
- Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Moschitti, A., Pang, B., and Daelemans, W., editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734. ACL.
- Leveraging procedural generation to benchmark reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 2048–2056. PMLR.
- Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR.
- Dao, T. (2023). Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691.
- Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060.
- Griffin: Mixing gated linear recurrences with local attention for efficient language models. arXiv preprint arXiv:2402.19427.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Towards a unified agent with foundation models. arXiv preprint arXiv:2307.09668.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- El-Hussieny, H. (2024). Real-time deep learning-based model predictive control of a 3-dof biped robot leg. Scientific Reports, 14(1):16243.
- Elman, J. L. (1990). Finding structure in time. Cogn. Sci., 14(2):179–211.
- Open x-embodiment: Robotic learning datasets and rt-x models.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning, pages 1407–1416. PMLR.
- Foundation models in robotics: Applications, challenges, and the future. The International Journal of Robotics Research, page 02783649241281508.
- Digital control of dynamic systems, volume 3. Addison-wesley Menlo Park.
- OPTQ: accurate quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Cloob: Modern hopfield networks with infoloob outperform clip.
- Mamba: Linear-time sequence modeling with selective state spaces. CoRR, abs/2312.00752.
- On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems, 35:35971–35983.
- Efficiently modeling long sequences with structured state spaces. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Combining recurrent, convolutional, and continuous-time models with linear state space layers. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 572–585.
- Rt-trajectory: Robotic task generalization via hindsight trajectory sketches.
- Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Dy, J. G. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1856–1865. PMLR.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR.
- Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 15979–15988. IEEE.
- Rainbow: Combining improvements in deep reinforcement learning. ArXiv.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531.
- Long short-term memory. Neural Comput., 9(8):1735–1780.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Toward general-purpose robots via foundation models: A survey and meta-analysis. arXiv preprint arXiv:2312.08782.
- In-context decision transformer: Reinforcement learning via hierarchical chain-of-thought. arXiv preprint arXiv:2405.20692.
- Robotic manipulation datasets for offline compositional reinforcement learning. arXiv preprint arXiv:2307.07091.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286.
- Towards diverse behaviors: A benchmark for imitation learning with human demonstrations. In The Twelfth International Conference on Learning Representations.
- Vima: General robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094.
- Vima: General robot manipulation with multimodal prompts.
- Jordan, M. I. (1990). Attractor dynamics and parallelism in a connectionist sequential machine, page 112–127. IEEE Press.
- Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR.
- Openvla: An open-source vision-language-action model. arXiv preprint arXiv:2406.09246.
- Full stack optimization of transformer inference: a survey. arXiv preprint arXiv:2302.14017.
- Towards general-purpose in-context learning agents. In NeurIPS 2023 Workshop on Generalization in Planning.
- Reinforcement learning with augmented data. ArXiv, 2004.14990.
- In-context reinforcement learning with algorithm distillation. arXiv preprint arXiv:2210.14215.
- Optimal brain damage. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 27-30, 1989], pages 598–605. Morgan Kaufmann.
- Supervised pretraining can learn in-context reinforcement learning. arXiv preprint arXiv:2306.14892.
- Multi-game decision transformers. arXiv preprint arXiv:2205.15241.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv preprint arXiv:2205.05638.
- Decoupled weight decay regularization. In International Conference on Learning Representations.
- Mimicgen: A data generation system for scalable robot learning using human demonstrations.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
- Composuite: A compositional reinforcement learning benchmark. In Chandar, S., Pascanu, R., and Precup, D., editors, Conference on Lifelong Learning Agents, CoLLAs 2022, 22-24 August 2022, McGill University, Montréal, Québec, Canada, volume 199 of Proceedings of Machine Learning Research, pages 982–1003. PMLR.
- Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks. arXiv preprint arXiv:2112.02845.
- The illusion of state in state-space models. CoRR, abs/2404.08819.
- Mixed precision training. arXiv preprint arXiv:1710.03740.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
- Xland-minigrid: Scalable meta-reinforcement learning environments in jax. arXiv preprint arXiv:2312.12044.
- Xland-100b: A large-scale multi-task dataset for in-context reinforcement learning. arXiv preprint arXiv:2406.08973.
- Octo: An open-source generalist robot policy.
- Resurrecting recurrent neural networks for long sequences. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 26670–26698. PMLR.
- One initialization to rule them all: Fine-tuning via explained variance adaptation. arXiv preprint arXiv:2410.07170.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Align-rudder: Learning from few demonstrations by reward redistribution. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S., editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 17531–17572. PMLR.
- Scaling instructable agents across many simulated worlds. arXiv preprint arXiv:2404.10179.
- Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR.
- Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Generalization to new sequential decision making tasks with in-context learning.
- A generalist agent. CoRR, abs/2205.06175.
- Real-time neural mpc: Deep learning model predictive control for quadrotors and agile robotic platforms. IEEE Robotics and Automation Letters, 8(4):2397–2404.
- Schmidhuber, J. (1992). Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Comput., 4(1):131–139.
- Schmidhuber, J. (2019). Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875.
- Bio-xlstm: Generative modeling, representation and in-context learning of biological and chemical sequences. Under reveiw.
- Fast and data-efficient training of rainbow: an experimental study on atari. arXiv preprint arXiv:2111.10247.
- Learning to modulate pre-trained models in rl. Advances in Neural Information Processing Systems, 36.
- Retrieval-augmented decision transformer: External memory for in-context rl. arXiv preprint arXiv:2410.07071.
- Proximal policy optimization algorithms. ArXiv.
- Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pages 30365–30380. PMLR.
- A dataset perspective on offline reinforcement learning. In Conference on Lifelong Learning Agents, pages 470–517. PMLR.
- Starformer: Transformer with state-action-reward representations for visual reinforcement learning. In European Conference on Computer Vision, pages 462–479. Springer.
- How crucial is transformer in decision transformer? arXiv preprint arXiv:2211.14655.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
- In-context reinforcement learning for variable action spaces. arXiv preprint arXiv:2312.13327.
- Simplified state space layers for sequence modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958.
- Deepmind control suite. CoRR, abs/1801.00690.
- Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Attention is all you need. Advances in neural information processing systems, 30.
- Grandmaster level in starcraft II using multi-agent reinforcement learning. Nat., 575(7782):350–354.
- Voyager: An open-ended embodied agent with large language models.
- Bootstrapped transformer for offline reinforcement learning. arXiv preprint arXiv:2206.08569.
- Continual world: A robotic benchmark for continual reinforcement learning. Advances in Neural Information Processing Systems, 34:28496–28510.
- Gradient surgery for multi-task learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR.
- Online decision transformer. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S., editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 27042–27059. PMLR.
- Episodic reinforcement learning with associative memory. In International Conference on Learning Representations.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. CoRR, abs/2401.09417.