Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces (2402.12845v1)

Published 20 Feb 2024 in cs.AI and cs.GT

Abstract: Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained LLMs. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at https://github.com/Zheng0428/MORE_.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. An optimistic perspective on offline reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 104–114. PMLR.
  2. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  3. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47.
  4. A neural probabilistic language model. In Advances in Neural Information Processing Systems, volume 13. MIT Press.
  5. Openai gym.
  6. Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, volume 34, pages 15084–15097. Curran Associates, Inc.
  7. André Correia and Luís A. Alexandre. 2022. Hierarchical decision transformer.
  8. Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Advances in Neural Information Processing Systems, volume 35, pages 18343–18362. Curran Associates, Inc.
  9. D4rl: Datasets for deep data-driven reinforcement learning.
  10. Benchmarking batch deep reinforcement learning algorithms.
  11. Scott Fujimoto and Shixiang (Shane) Gu. 2021. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, pages 20132–20145. Curran Associates, Inc.
  12. Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
  13. On transforming reinforcement learning by transformer: The development trajectory.
  14. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 9118–9147. PMLR.
  15. Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pages 287–318. PMLR.
  16. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, volume 34, pages 1273–1286. Curran Associates, Inc.
  17. Implicit class-conditioned domain alignment for unsupervised domain adaptation. In International Conference on Machine Learning, pages 4816–4827. PMLR.
  18. Read to play (r2-play): Decision transformer with multimodal game instruction. CoRR, abs/2402.04154.
  19. Morel: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pages 21810–21823. Curran Associates, Inc.
  20. Offline reinforcement learning with fisher divergence critic regularization. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5774–5783. PMLR.
  21. Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pages 1179–1191. Curran Associates, Inc.
  22. Multi-game decision transformers. In Advances in Neural Information Processing Systems, volume 35, pages 27921–27936. Curran Associates, Inc.
  23. Offline reinforcement learning: Tutorial, review, and perspectives on open problems.
  24. Mind’s eye: Grounded language model reasoning through simulation.
  25. Aligning generative language models with human values. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 241–252, Seattle, United States. Association for Computational Linguistics.
  26. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning.
  27. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  28. Language models are unsupervised multitask learners.
  29. A generalist agent.
  30. Can wikipedia help offline reinforcement learning?
  31. Learning to modulate pre-trained models in rl.
  32. Neural machine translation of rare words with subword units.
  33. Starformer: Transformer with state-action-reward representations for visual reinforcement learning.
  34. Intelligent decision-making and human language communication based on deep reinforcement learning in a wargame environment. IEEE Transactions on Human-Machine Systems, 53(1):201–214.
  35. Hao Tan and Mohit Bansal. 2019. Lxmert: Learning cross-modality encoder representations from transformers.
  36. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems, 34:200–212.
  37. Attention is all you need. Advances in neural information processing systems, 30.
  38. Interactive natural language processing.
  39. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents.
  40. Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13587–13597.
  41. Behavior regularized offline reinforcement learning.
  42. Uncertainty weighted actor-critic for offline reinforcement learning.
  43. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:10299–10312.
  44. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142.
  45. Aligntts: Efficient feed-forward text-to-speech system without explicit alignment. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6714–6718. IEEE.
  46. Saformer: A conditional sequence modeling approach to offline safe reinforcement learning.
  47. Can offline reinforcement learning help natural language understanding?
  48. Online decision transformer. In international conference on machine learning, pages 27042–27059. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tianyu Zheng (28 papers)
  2. Ge Zhang (170 papers)
  3. Xingwei Qu (30 papers)
  4. Ming Kuang (1 paper)
  5. Stephen W. Huang (9 papers)
  6. Zhaofeng He (31 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com