2000 character limit reached
Action Anticipation with Goal Consistency (2306.15045v1)
Published 26 Jun 2023 in cs.CV
Abstract: In this paper, we address the problem of short-term action anticipation, i.e., we want to predict an upcoming action one second before it happens. We propose to harness high-level intent information to anticipate actions that will take place in the future. To this end, we incorporate an additional goal prediction branch into our model and propose a consistency loss function that encourages the anticipated actions to conform to the high-level goal pursued in the video. In our experiments, we show the effectiveness of the proposed approach and demonstrate that our method achieves state-of-the-art results on two large-scale datasets: Assembly101 and COIN.
- “Rolling-unrolling lstms for action anticipation from first-person video,” TPAMI 2020.
- “Multi-modal temporal convolutional network for anticipating actions in egocentric videos,” in CVPRW 2021.
- “Self-supervised learning for unintentional action prediction,” in DAGM GCPR 2022.
- “Rethinking learning approaches for long-term action anticipation,” in ECCV 2022.
- “MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition,” in CVPR 2022.
- “Anticipative Video Transformer,” in ICCV 2021.
- “Red: Reinforced encoder-decoder networks for action anticipation,” in BMVC 2017.
- “Anticipating visual representations from unlabeled video,” in CVPR 2016.
- “Forecasting human-object interaction: Joint prediction of motor attention and actions in first person video,” in ECCV 2020.
- “Forecasting action through contact representations from first person video,” TPAMI 2021.
- “Leveraging the present to anticipate the future in videos,” in CVPRW 2019.
- “Recurrent neural networks for driver activity anticipation via sensory-fusion architecture,” in ICRA 2016.
- “The epic-kitchens dataset: Collection, challenges and baselines,” TPAMI 2021.
- “Assembly101: A large-scale multi-view video dataset for understanding procedural activities,” CVPR 2022.
- “In the eye of beholder: Joint learning of gaze and actions in first person video,” in ECCV 2018.
- “When will you do what? - anticipating temporal occurrences of activities,” in CVPR 2018.
- “Time-conditioned action anticipation in one shot,” in CVPR 2019.
- “Future transformer for long-term action anticipation,” in CVPR 2022.
- “Attention is all you need,” in NIPS 2017.
- “Temporal aggregate representations for long-range video understanding,” in ECCV 2020.
- “Non-local neural networks,” CVPR 2018.
- “Real-time online video detection with temporal smoothing transformers,” in ECCV 2022.
- “Intention-based long-term human motion anticipation,” 3DV 2021.
- “Intention-conditioned long-term human egocentric action anticipation,” in WACV 2023.
- “Action anticipation using latent goal learning,” in WACV 2022.
- Yongming Rao Yu Zheng Danyang Zhang Lili Zhao Jiwen Lu Jie Zhou Yansong Tang, Dajun Ding, “Coin: A large-scale dataset for comprehensive instructional video analysis,” CVPR 2019.
- Olga Zatsarynna (8 papers)
- Juergen Gall (121 papers)