Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog (1902.00579v2)

Published 1 Feb 2019 in cs.CV and cs.CL

Abstract: This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image. In each question-answering turn of a dialog, ReDAN infers the answer progressively through multiple reasoning steps. In each step of the reasoning process, the semantic representation of the question is updated based on the image and the previous dialog history, and the recurrently-refined representation is used for further reasoning in the subsequent step. On the VisDial v1.0 dataset, the proposed ReDAN model achieves a new state-of-the-art of 64.47% NDCG score. Visualization on the reasoning process further demonstrates that ReDAN can locate context-relevant visual and textual clues via iterative refinement, which can lead to the correct answer step-by-step.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhe Gan (135 papers)
  2. Yu Cheng (354 papers)
  3. Ahmed El Kholy (4 papers)
  4. Linjie Li (89 papers)
  5. Jingjing Liu (139 papers)
  6. Jianfeng Gao (344 papers)
Citations (103)