Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Effective Multimodal Reinforcement Learning with Modality Alignment and Importance Enhancement (2302.09318v1)

Published 18 Feb 2023 in cs.LG and cs.AI

Abstract: Many real-world applications require an agent to make robust and deliberate decisions with multimodal information (e.g., robots with multi-sensory inputs). However, it is very challenging to train the agent via reinforcement learning (RL) due to the heterogeneity and dynamic importance of different modalities. Specifically, we observe that these issues make conventional RL methods difficult to learn a useful state representation in the end-to-end training with multimodal information. To address this, we propose a novel multimodal RL approach that can do multimodal alignment and importance enhancement according to their similarity and importance in terms of RL tasks respectively. By doing so, we are able to learn an effective state representation and consequentially improve the RL training process. We test our approach on several multimodal RL domains, showing that it outperforms state-of-the-art methods in terms of learning speed and policy quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jinming Ma (5 papers)
  2. Feng Wu (198 papers)
  3. Yingfeng Chen (30 papers)
  4. Xianpeng Ji (2 papers)
  5. Yu Ding (70 papers)
Citations (3)