2000 character limit reached
Guided Attention for Next Active Object @ EGO4D STA Challenge (2305.16066v3)
Published 25 May 2023 in cs.CV
Abstract: In this technical report, we describe the Guided-Attention mechanism based solution for the short-term anticipation (STA) challenge for the EGO4D challenge. It combines the object detections, and the spatiotemporal features extracted from video clips, enhancing the motion and contextual information, and further decoding the object-centric and motion-centric information to address the problem of STA in egocentric videos. For the challenge, we build our model on top of StillFast with Guided Attention applied on fast network. Our model obtains better performance on the validation set and also achieves state-of-the-art (SOTA) results on the challenge test set for EGO4D Short-Term Object Interaction Anticipation Challenge.
- “Enhancing next active object-based egocentric action anticipation with guided attention,” 2023.
- “Stillfast: An end-to-end approach for short-term object interaction anticipation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023.
- “Anticipative Video Transformer,” in ICCV, 2021.
- “Forecasting human object interaction: Joint prediction of motor attention and actions in first person video,” in ECCV, 2020.
- “What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention.,” in ICCV, 2019.
- “MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition,” in CVPR, 2022.
- “Anticipating next active objects for egocentric videos,” 2023.
- “Next-active-object prediction from egocentric videos,” Journal of Visual Communication and Image Representation, vol. 49, pp. 401–411, 2017.
- “Detecting activities of daily living in first-person camera views,” in IEEE CVPR, 2012, pp. 2847–2854.
- “Ego4d: Around the World in 3,000 Hours of Egocentric Video,” in CVPR, 2022.
- “Forecasting action through contact representations from first person video,” IEEE TPAMI, pp. 1–1, 2021.
- “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, vol. 28.
- “Attention is all you need,” in NeurIPS, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., 2017, vol. 30.
- “Slowfast networks for video recognition,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6202–6211.
- “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- “Detectron2,” https://github.com/facebookresearch/detectron2, 2019.