Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos (2109.00829v1)

Published 2 Sep 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow, and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens-55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results of RULSTM architecture for Top-5 accuracy metric at different anticipation times.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Nada Osman (4 papers)
  2. Guglielmo Camporese (8 papers)
  3. Pasquale Coscia (8 papers)
  4. Lamberto Ballan (32 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.