Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal-Relational CrossTransformers for Few-Shot Action Recognition (2101.06184v3)

Published 15 Jan 2021 in cs.CV

Abstract: We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Toby Perrett (18 papers)
  2. Alessandro Masullo (8 papers)
  3. Tilo Burghardt (42 papers)
  4. Majid Mirmehdi (47 papers)
  5. Dima Damen (83 papers)
Citations (131)

Summary

We haven't generated a summary for this paper yet.