Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing (2004.03755v2)

Published 8 Apr 2020 in cs.CL, cs.AI, and cs.CV

Abstract: Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each Knowledge Gap (KG) describes the reasoning abilities needed to arrive at a resolution. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image. These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Goonmeet Bajaj (8 papers)
  2. Bortik Bandyopadhyay (9 papers)
  3. Daniel Schmidt (39 papers)
  4. Pranav Maneriker (12 papers)
  5. Christopher Myers (1 paper)
  6. Srinivasan Parthasarathy (76 papers)