Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? (2405.12094v2)
Abstract: Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL). Yet, it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba (dubbed DeMa) in offline RL from the aspect of data structures and essential components with the following insights: (1) Long sequences impose a significant computational burden without contributing to performance improvements since DeMa's focus on sequences diminishes approximately exponentially. Consequently, we introduce a Transformer-like DeMa as opposed to an RNN-like DeMa. (2) For the components of DeMa, we identify the hidden attention mechanism as a critical factor in its success, which can also work well with other residual structures and does not require position embedding. Extensive evaluations demonstrate that our specially designed DeMa is compatible with trajectory optimization and surpasses previous methods, outperforming Decision Transformer (DT) with higher performance while using 30\% fewer parameters in Atari, and exceeding DT with only a quarter of the parameters in MuJoCo.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
- Offline rl without off-policy evaluation. Advances in neural information processing systems, 34:4933–4946, 2021.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
- A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
- Revisiting design choices in offline model-based reinforcement learning. arXiv preprint arXiv:2110.04135, 2021.
- Haoyang He. A survey on offline model-based reinforcement learning. arXiv preprint arXiv:2305.03360, 2023.
- Comparing model-free and model-based algorithms for offline reinforcement learning. IFAC-PapersOnLine, 55(15):19–26, 2022.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Decision convformer: Local filtering in metaformer is sufficient for decision making. arXiv preprint arXiv:2310.03022, 2023.
- Graph decision transformer. arXiv preprint arXiv:2303.03747, 2023.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
- On transforming reinforcement learning by transformer: The development trajectory. arXiv preprint arXiv:2212.14164, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology (TIST), 12(5):1–32, 2021.
- Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020.
- Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023.
- Synthesizer: Rethinking self-attention for transformer models. In International conference on machine learning, pages 10183–10192. PMLR, 2021.
- Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022.
- James D Hamilton. State-space models. Handbook of econometrics, 4:3039–3080, 1994.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- The hidden attention of mamba models. arXiv preprint arXiv:2403.01590, 2024.
- Videomamba: State space model for efficient video understanding. arXiv preprint arXiv:2403.06977, 2024.
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
- Mastering memory tasks with world models. arXiv preprint arXiv:2403.04253, 2024.
- Facing off world model backbones: Rnns, transformers, and s4. Advances in Neural Information Processing Systems, 36, 2024.
- Structured state space models for in-context reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- xlstm: Extended long short-term memory. arXiv preprint arXiv:2405.04517, 2024.
- Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems, 34:28954–28967, 2021.
- Prompt-tuning decision transformer with preference ranking. arXiv preprint arXiv:2305.09648, 2023.
- Online decision transformer. In international conference on machine learning, pages 27042–27059. PMLR, 2022.
- Elastic decision transformer. Advances in Neural Information Processing Systems, 36, 2024.
- Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
- A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044, 2023.
- Rvs: What is essential for offline rl via supervised learning? arXiv preprint arXiv:2112.10751, 2021.
- Merging decision transformers: Weight averaging for forming multi-task policies. arXiv preprint arXiv:2303.07551, 2023.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Decision s4: Efficient sequence-based rl via state spaces layers. In The Eleventh International Conference on Learning Representations, 2022.
- Toshihiro Ota. Decision mamba: Reinforcement learning via sequence modeling with selective state spaces. arXiv preprint arXiv:2403.19925, 2024.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Sidd Karamcheti Sasha Rush. The annotated s4. https://srush.github.io/annotated-s4/, 2023.
- U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
- Vivim: a video vision mamba for medical video object segmentation. arXiv preprint arXiv:2401.14168, 2024.
- Motion mamba: Efficient and long sequence motion generation with hierarchical and bidirectional selective ssm. arXiv preprint arXiv:2403.07487, 2024.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Mastering atari games with limited data. Advances in neural information processing systems, 34:25476–25488, 2021.
- Openai gym, 2016.
- Mambaout: Do we really need mamba for vision? arXiv preprint arXiv:2405.07992, 2024.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Yang Dai (13 papers)
- Oubo Ma (6 papers)
- Longfei Zhang (5 papers)
- Xingxing Liang (5 papers)
- Shengchao Hu (19 papers)
- Mengzhu Wang (21 papers)
- Shouling Ji (136 papers)
- Jincai Huang (18 papers)
- Li Shen (362 papers)