Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PDFVQA: A New Dataset for Real-World VQA on PDF Documents (2304.06447v5)

Published 13 Apr 2023 in cs.CV and cs.CL

Abstract: Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yihao Ding (16 papers)
  2. Siwen Luo (14 papers)
  3. Hyunsuk Chung (6 papers)
  4. Soyeon Caren Han (48 papers)
Citations (16)