Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge (2305.18842v1)

Published 30 May 2023 in cs.CL, cs.AI, and cs.CV

Abstract: The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained LLMs (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality -- only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, RASO first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code and models are released at http://cogcomp.org/page/publication_view/1010

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Xingyu Fu (22 papers)
  2. Sheng Zhang (212 papers)
  3. Gukyeong Kwon (14 papers)
  4. Pramuditha Perera (23 papers)
  5. Henghui Zhu (24 papers)
  6. Yuhao Zhang (107 papers)
  7. Alexander Hanbo Li (17 papers)
  8. William Yang Wang (254 papers)
  9. Zhiguo Wang (100 papers)
  10. Vittorio Castelli (24 papers)
  11. Patrick Ng (29 papers)
  12. Dan Roth (222 papers)
  13. Bing Xiang (74 papers)
Citations (15)