Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering (2002.10215v2)

Published 24 Feb 2020 in cs.CV

Abstract: Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method's ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge. Experiments and analysis are provided that show the value of the dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xinyu Wang (186 papers)
  2. Yuliang Liu (82 papers)
  3. Chunhua Shen (404 papers)
  4. Chun Chet Ng (6 papers)
  5. Canjie Luo (20 papers)
  6. Lianwen Jin (116 papers)
  7. Chee Seng Chan (50 papers)
  8. Anton van den Hengel (188 papers)
  9. Liangwei Wang (11 papers)
Citations (78)