Recurrent Action Transformer with Memory (2306.09459v5)
Abstract: Recently, the use of transformers in offline reinforcement learning has become a rapidly developing area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events (POMDPs), it is essential to capture both the event itself and the decision point in the context of the model. However, the quadratic complexity of the attention mechanism limits the potential for context expansion. One solution to this problem is to extend transformers with memory mechanisms. This paper proposes a Recurrent Action Transformer with Memory (RATE), a novel model architecture that incorporates a recurrent memory mechanism designed to regulate information retention. To evaluate our model, we conducted extensive experiments on memory-intensive environments (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory), classic Atari games, and MuJoCo control environments. The results show that using memory can significantly improve performance in memory-intensive environments, while maintaining or improving results in classic environments. We believe that our results will stimulate research on memory mechanisms for transformers applicable to offline reinforcement learning.
- “Long short-term memory” In Neural computation 9.8 MIT press, 1997, pp. 1735–1780
- Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio “Neural machine translation by jointly learning to align and translate” In arXiv preprint arXiv:1409.0473, 2014
- “On the properties of neural machine translation: Encoder-decoder approaches” In arXiv preprint arXiv:1409.1259, 2014
- “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
- Aaron Van Den Oord and Oriol Vinyals “Neural discrete representation learning” In Advances in neural information processing systems 30, 2017
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- “Reinforcement Learning, second edition: An Introduction”, Adaptive Computation and Machine Learning series MIT Press, 2018 URL: https://books.google.ru/books?id=sWV0DwAAQBAJ
- “Transformer-xl: Attentive language models beyond a fixed-length context” In arXiv preprint arXiv:1901.02860, 2019
- “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks” In Advances in neural information processing systems 32, 2019
- “Compressive transformers for long-range sequence modelling” In arXiv preprint arXiv:1911.05507, 2019
- “Vl-bert: Pre-training of generic visual-linguistic representations” In arXiv preprint arXiv:1908.08530, 2019
- “Lxmert: Learning cross-modality encoder representations from transformers” In arXiv preprint arXiv:1908.07490, 2019
- Rishabh Agarwal, Dale Schuurmans and Mohammad Norouzi “An optimistic perspective on offline reinforcement learning” In International Conference on Machine Learning, 2020, pp. 104–114 PMLR
- “wav2vec 2.0: A framework for self-supervised learning of speech representations” In Advances in neural information processing systems 33, 2020, pp. 12449–12460
- “ERNIE-Doc: A retrospective long-document modeling transformer” In arXiv preprint arXiv:2012.15688, 2020
- “An image is worth 16x16 words: Transformers for image recognition at scale” In arXiv preprint arXiv:2010.11929, 2020
- “Conservative Q-Learning for Offline Reinforcement Learning”, 2020 arXiv:2006.04779 [cs.LG]
- “Mart: Memory-augmented recurrent transformer for coherent video paragraph captioning” In arXiv preprint arXiv:2005.05402, 2020
- “Stabilizing transformers for reinforcement learning” In International conference on machine learning, 2020, pp. 7487–7498 PMLR
- “Memformer: The memory-augmented transformer” In arXiv preprint arXiv:2010.06891, 2020
- “Deformable detr: Deformable transformers for end-to-end object detection” In arXiv preprint arXiv:2010.04159, 2020
- “Decision transformer: Reinforcement learning via sequence modeling” In Advances in neural information processing systems 34, 2021, pp. 15084–15097
- “D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, 2021 arXiv:2004.07219 [cs.LG]
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 IEEE, 2021, pp. 3451–3460
- Michael Janner, Qiyang Li and Sergey Levine “Offline reinforcement learning as one big sequence modeling problem” In Advances in neural information processing systems 34, 2021, pp. 1273–1286
- “Mdetr-modulated detection for end-to-end multi-modal understanding” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1780–1790
- Manoj Kumar, Dirk Weissenborn and Nal Kalchbrenner “Colorization transformer” In arXiv preprint arXiv:2102.04432, 2021
- “Training data-efficient image transformers & distillation through attention” In International conference on machine learning, 2021, pp. 10347–10357 PMLR
- Xinpeng Wang, Chandan Yeshwanth and Matthias Nießner “Sceneformer: Indoor scene generation with transformers” In 2021 International Conference on 3D Vision (3DV), 2021, pp. 106–115 IEEE
- Aydar Bulatov, Yury Kuratov and Mikhail Burtsev “Recurrent memory transformer” In Advances in Neural Information Processing Systems 35, 2022, pp. 11079–11091
- “Vima: General robot manipulation with multimodal prompts” In arXiv preprint arXiv:2210.03094, 2022
- “Efficient planning in a compact latent action space” In arXiv preprint arXiv:2208.10291, 2022
- “Multi-game decision transformers” In Advances in Neural Information Processing Systems 35, 2022, pp. 27921–27936
- “Interactive language: Talking to robots in real time” In arXiv preprint arXiv:2210.06407, 2022
- “A generalist agent” In arXiv preprint arXiv:2205.06175, 2022
- “Memorizing transformers” In arXiv preprint arXiv:2203.08913, 2022
- “Palm-e: An embodied multimodal language model” In arXiv preprint arXiv:2303.03378, 2023
- Mohit Shridhar, Lucas Manuelli and Dieter Fox “Perceiver-actor: A multi-task transformer for robotic manipulation” In Conference on Robot Learning, 2023, pp. 785–799 PMLR