Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA (2108.09717v8)

Published 22 Aug 2021 in cs.CV and cs.MM

Abstract: The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image. We address this zero-shot nature of the problem by proposing the generalized use of external knowledge to augment our understanding of the scene text. We design a framework to extract, validate, and reason with knowledge using a standard multimodal transformer for vision language understanding tasks. Through empirical evidence and qualitative results, we demonstrate how external knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities. We generate results comparable to the state-of-the-art on three publicly available datasets, under the constraints of similar upstream OCR systems and training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Arka Ujjal Dey (6 papers)
  2. Ernest Valveny (28 papers)
  3. Gaurav Harit (8 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.