Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot Visual Question Answering using Knowledge Graph (2107.05348v4)

Published 12 Jul 2021 in cs.AI

Abstract: Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhuo Chen (319 papers)
  2. Jiaoyan Chen (85 papers)
  3. Yuxia Geng (22 papers)
  4. Jeff Z. Pan (78 papers)
  5. Zonggang Yuan (8 papers)
  6. Huajun Chen (198 papers)
Citations (62)