Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inverse Visual Question Answering with Multi-Level Attentions (1909.07583v2)

Published 17 Sep 2019 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: In this paper, we propose a novel deep multi-level attention model to address inverse visual question answering. The proposed model generates regional visual and semantic features at the object level and then enhances them with the answer cue by using attention mechanisms. Two levels of multiple attentions are employed in the model, including the dual attention at the partial question encoding step and the dynamic attention at the next question word generation step. We evaluate the proposed model on the VQA V1 dataset. It demonstrates state-of-the-art performance in terms of multiple commonly used metrics.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yaser Alwattar (1 paper)
  2. Yuhong Guo (52 papers)
Citations (1)