Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue (2106.15550v1)

Published 29 Jun 2021 in cs.CV

Abstract: Building an interactive artificial intelligence that can ask questions about the real world is one of the biggest challenges for vision and language problems. In particular, goal-oriented visual dialogue, where the aim of the agent is to seek information by asking questions during a turn-taking dialogue, has been gaining scholarly attention recently. While several existing models based on the GuessWhat?! dataset have been proposed, the Questioner typically asks simple category-based questions or absolute spatial questions. This might be problematic for complex scenes where the objects share attributes or in cases where descriptive questions are required to distinguish objects. In this paper, we propose a novel Questioner architecture, called Unified Questioner Transformer (UniQer), for descriptive question generation with referring expressions. In addition, we build a goal-oriented visual dialogue task called CLEVR Ask. It synthesizes complex scenes that require the Questioner to generate descriptive questions. We train our model with two variants of CLEVR Ask datasets. The results of the quantitative and qualitative evaluations show that UniQer outperforms the baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shoya Matsumori (5 papers)
  2. Kosuke Shingyouchi (2 papers)
  3. Yuki Abe (8 papers)
  4. Yosuke Fukuchi (11 papers)
  5. Komei Sugiura (40 papers)
  6. Michita Imai (8 papers)
Citations (15)