Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering (2309.14389v1)

Published 25 Sep 2023 in cs.CV and cs.AI

Abstract: Recent document question answering models consist of two key components: the vision encoder, which captures layout and visual elements in images, and a LLM that helps contextualize questions to the image and supplements them with external world knowledge to generate accurate answers. However, the relative contributions of the vision encoder and the LLM in these tasks remain unclear. This is especially interesting given the effectiveness of instruction-tuned LLMs, which exhibit remarkable adaptability to new tasks. To this end, we explore the following aspects in this work: (1) The efficacy of an LLM-only approach on document question answering tasks (2) strategies for serializing textual information within document images and feeding it directly to an instruction-tuned LLM, thus bypassing the need for an explicit vision encoder (3) thorough quantitative analysis on the feasibility of such an approach. Our comprehensive analysis encompasses six diverse benchmark datasets, utilizing LLMs of varying scales. Our findings reveal that a strategy exclusively reliant on the LLM yields results that are on par with or closely approach state-of-the-art performance across a range of datasets. We posit that this evaluation framework will serve as a guiding resource for selecting appropriate datasets for future research endeavors that emphasize the fundamental importance of layout and image content information.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Nidhi Hegde (15 papers)
  2. Sujoy Paul (25 papers)
  3. Gagan Madan (10 papers)
  4. Gaurav Aggarwal (27 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com