Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models (2305.14985v1)

Published 24 May 2023 in cs.CV and cs.CL

Abstract: The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They rely on domain-specific sub-question decomposing models. 2) They force models to predict the final answer even if the sub-questions or sub-answers provide insufficient information. We address these limitations via IdealGPT, a framework that iteratively decomposes VL reasoning using LLMs. Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer. These three modules perform the divide-and-conquer procedure iteratively until the model is confident about the final answer to the main question. We evaluate IdealGPT on multiple challenging VL reasoning tasks under a zero-shot setting. In particular, our IdealGPT outperforms the best existing GPT-4-like models by an absolute 10% on VCR and 15% on SNLI-VE. Code is available at https://github.com/Hxyou/IdealGPT

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Haoxuan You (33 papers)
  2. Rui Sun (105 papers)
  3. Zhecan Wang (18 papers)
  4. Long Chen (395 papers)
  5. Gengyu Wang (5 papers)
  6. Hammad A. Ayyubi (8 papers)
  7. Kai-Wei Chang (292 papers)
  8. Shih-Fu Chang (131 papers)
Citations (39)
Github Logo Streamline Icon: https://streamlinehq.com