Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Question Answering with Phrases via Semantic Roles (2104.03762v1)

Published 8 Apr 2021 in cs.CV and cs.CL

Abstract: Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases. These metrics limit the VidQA models' application scenario. In this work, we leverage semantic roles derived from video descriptions to mask out certain phrases, to introduce VidQAP which poses VidQA as a fill-in-the-phrase task. To enable evaluation of answer phrases, we compute the relative improvement of the predicted answer compared to an empty string. To reduce the influence of language bias in VidQA datasets, we retrieve a video having a different answer for the same question. To facilitate research, we construct ActivityNet-SRL-QA and Charades-SRL-QA and benchmark them by extending three vision-LLMs. We further perform extensive analysis and ablative studies to guide future work.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Arka Sadhu (8 papers)
  2. Kan Chen (74 papers)
  3. Ram Nevatia (54 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.