Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? (2210.14966v1)

Published 26 Oct 2022 in cs.CL, cs.AI, and cs.CV

Abstract: In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine "understanding" and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine "understanding" datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yang Trista Cao (7 papers)
  2. Kyle Seelman (2 papers)
  3. Kyungjun Lee (11 papers)
  4. Hal Daumé III (76 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.