Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Question Answering on Screencast Tutorials (2008.00544v1)

Published 2 Aug 2020 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wentian Zhao (14 papers)
  2. Seokhwan Kim (29 papers)
  3. Ning Xu (151 papers)
  4. Hailin Jin (53 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.