Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition (2312.02226v1)

Published 4 Dec 2023 in cs.CV

Abstract: Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal information to improve the recognition of seen actions. Yet, they compromise with standard less-informative action descriptions, thus faltering when confronted with novel actions. Drawing inspiration from human cognitive processes, we argue that augmenting text embeddings with human prior knowledge is pivotal for open-vocabulary video action recognition. To realize this, we innovatively blend video models with LLMs to devise Action-conditioned Prompts. Specifically, we harness the knowledge in LLMs to produce a set of descriptive sentences that contain distinctive features for identifying given actions. Building upon this foundation, we further introduce a multi-modal action knowledge alignment mechanism to align concepts in video and textual knowledge encapsulated within the prompts. Extensive experiments on various video benchmarks, including zero-shot, few-shot, and base-to-novel generalization settings, demonstrate that our method not only sets new SOTA performance but also possesses excellent interpretability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chengyou Jia (17 papers)
  2. Minnan Luo (61 papers)
  3. Xiaojun Chang (148 papers)
  4. Zhuohang Dang (12 papers)
  5. Mingfei Han (15 papers)
  6. Mengmeng Wang (73 papers)
  7. Guang Dai (38 papers)
  8. Sizhe Dang (4 papers)
  9. Jingdong Wang (236 papers)
Citations (1)