Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Event Extraction via Tracking Visual States of Arguments (2211.01781v2)

Published 3 Nov 2022 in cs.CV and cs.CL

Abstract: Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring fine-grained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of arguments, we decompose them into changes in pixels within objects, displacements of objects, and interactions among multiple arguments. We further propose Object State Embedding, Object Motion-aware Embedding and Argument Interaction Embedding to encode and track these changes respectively. Experiments on various video event extraction tasks demonstrate significant improvements compared to state-of-the-art models. In particular, on verb classification, we achieve 3.49% absolute gains (19.53% relative gains) in F1@5 on Video Situation Recognition.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Guang Yang (422 papers)
  2. Manling Li (47 papers)
  3. Jiajie Zhang (30 papers)
  4. Xudong Lin (37 papers)
  5. Shih-Fu Chang (131 papers)
  6. Heng Ji (266 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.