Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog (1805.03257v1)

Published 8 May 2018 in cs.CL

Abstract: Creating an intelligent conversational system that understands vision and language is one of the ultimate goals in AI~\cite{winograd1972understanding}. Extensive research has focused on vision-to-language generation, however, limited research has touched on combining these two modalities in a goal-driven dialog context. We propose a multimodal hierarchical reinforcement learning framework that dynamically integrates vision and language for task-oriented visual dialog. The framework jointly learns the multimodal dialog state representation and the hierarchical dialog policy to improve both dialog task success and efficiency. We also propose a new technique, state adaptation, to integrate context awareness in the dialog state representation. We evaluate the proposed framework and the state adaptation technique in an image guessing game and achieve promising results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jiaping Zhang (3 papers)
  2. Tiancheng Zhao (48 papers)
  3. Zhou Yu (206 papers)
Citations (37)