Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation (1909.05365v2)

Published 6 Sep 2019 in cs.CL, cs.AI, and cs.LG

Abstract: Reinforcement learning (RL) is an effective approach to learn an optimal dialog policy for task-oriented visual dialog systems. A common practice is to apply RL on a neural sequence-to-sequence (seq2seq) framework with the action space being the output vocabulary in the decoder. However, it is difficult to design a reward function that can achieve a balance between learning an effective policy and generating a natural dialog response. This paper proposes a novel framework that alternatively trains a RL policy for image guessing and a supervised seq2seq model to improve dialog generation quality. We evaluate our framework on the GuessWhich task and the framework achieves the state-of-the-art performance in both task completion and dialog quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mingyang Zhou (27 papers)
  2. Josh Arnold (4 papers)
  3. Zhou Yu (206 papers)
Citations (11)