Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Action Anticipation with Goal Consistency (2306.15045v1)

Published 26 Jun 2023 in cs.CV

Abstract: In this paper, we address the problem of short-term action anticipation, i.e., we want to predict an upcoming action one second before it happens. We propose to harness high-level intent information to anticipate actions that will take place in the future. To this end, we incorporate an additional goal prediction branch into our model and propose a consistency loss function that encourages the anticipated actions to conform to the high-level goal pursued in the video. In our experiments, we show the effectiveness of the proposed approach and demonstrate that our method achieves state-of-the-art results on two large-scale datasets: Assembly101 and COIN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Rolling-unrolling lstms for action anticipation from first-person video,” TPAMI 2020.
  2. “Multi-modal temporal convolutional network for anticipating actions in egocentric videos,” in CVPRW 2021.
  3. “Self-supervised learning for unintentional action prediction,” in DAGM GCPR 2022.
  4. “Rethinking learning approaches for long-term action anticipation,” in ECCV 2022.
  5. “MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition,” in CVPR 2022.
  6. “Anticipative Video Transformer,” in ICCV 2021.
  7. “Red: Reinforced encoder-decoder networks for action anticipation,” in BMVC 2017.
  8. “Anticipating visual representations from unlabeled video,” in CVPR 2016.
  9. “Forecasting human-object interaction: Joint prediction of motor attention and actions in first person video,” in ECCV 2020.
  10. “Forecasting action through contact representations from first person video,” TPAMI 2021.
  11. “Leveraging the present to anticipate the future in videos,” in CVPRW 2019.
  12. “Recurrent neural networks for driver activity anticipation via sensory-fusion architecture,” in ICRA 2016.
  13. “The epic-kitchens dataset: Collection, challenges and baselines,” TPAMI 2021.
  14. “Assembly101: A large-scale multi-view video dataset for understanding procedural activities,” CVPR 2022.
  15. “In the eye of beholder: Joint learning of gaze and actions in first person video,” in ECCV 2018.
  16. “When will you do what? - anticipating temporal occurrences of activities,” in CVPR 2018.
  17. “Time-conditioned action anticipation in one shot,” in CVPR 2019.
  18. “Future transformer for long-term action anticipation,” in CVPR 2022.
  19. “Attention is all you need,” in NIPS 2017.
  20. “Temporal aggregate representations for long-range video understanding,” in ECCV 2020.
  21. “Non-local neural networks,” CVPR 2018.
  22. “Real-time online video detection with temporal smoothing transformers,” in ECCV 2022.
  23. “Intention-based long-term human motion anticipation,” 3DV 2021.
  24. “Intention-conditioned long-term human egocentric action anticipation,” in WACV 2023.
  25. “Action anticipation using latent goal learning,” in WACV 2022.
  26. Yongming Rao Yu Zheng Danyang Zhang Lili Zhao Jiwen Lu Jie Zhou Yansong Tang, Dajun Ding, “Coin: A large-scale dataset for comprehensive instructional video analysis,” CVPR 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Olga Zatsarynna (8 papers)
  2. Juergen Gall (121 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.