Flow Dynamics Correction for Action Recognition (2310.10059v2)
Abstract: Various research studies indicate that action recognition performance highly depends on the types of motions being extracted and how accurate the human actions are represented. In this paper, we investigate different optical flow, and features extracted from these optical flow that capturing both short-term and long-term motion dynamics. We perform power normalization on the magnitude component of optical flow for flow dynamics correction to boost subtle or dampen sudden motions. We show that existing action recognition models which rely on optical flow are able to get performance boosted with our corrected optical flow. To further improve performance, we integrate our corrected flow dynamics into popular models through a simple hallucination step by selecting only the best performing optical flow features, and we show that by 'translating' the CNN feature maps into these optical flow features with different scales of motions leads to the new state-of-the-art performance on several benchmarks including HMDB-51, YUP++, fine-grained action recognition on MPII Cooking Activities, and large-scale Charades.
- T. Brox and J. Malik. Large displacement optical flow: Descriptor matching in variational motion estimation. TPAMI, 33(3):500–513, March 2011.
- J. Carreira and A. Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CVPR, pages 1–10, 2018.
- Generalized rank pooling for action recognition. In CVPR, 2017.
- Non-linear temporal subspace representations for activity recognition. In CVPR, pages 2197–2206, 2018.
- Slowfast networks for video recognition. In ICCV, pages 6202–6211, Seoul, Korea, 2019. IEEE.
- Temporal residual networks for dynamic scene recognition. In CVPR, 2017.
- Convolutional Two-Stream Network Fusion for Video Action Recognition. In CVPR, pages 1–9, 2016.
- I. Hadji and R. P. Wildes. A new large scale dynamic texture dataset with application to ConvNet understanding. In ECCV, September 2018.
- Movinets: Mobile video networks for efficient video recognition. arXiv, 2021.
- Tensor representations for action recognition. TPAMI, 2020.
- HMDB: A large video database for human motion recognition. In ICCV, pages 2556–2563, 2011.
- Evolving space-time neural architectures for videos. In ICCV, 2019.
- Rethinking video vits: Sparse video tubes for joint image and video learning. In CVPR, pages 2214–2224, 2023.
- Fusing higher-order features in graph neural networks for skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Epicflow: Edge-preserving interpolation of correspondences for optical flow. In CVPR, pages 1164–1172, 2015.
- A database for fine grained activity detection of cooking activities. In CVPR, 2012.
- Assemblenet++: Assembling modality representations via attention connections. In ECCV, pages 1–19, 2020.
- Assemblenet: Searching for multi-stream neural connectivity in video architectures. In ICLR, pages 1–15, 2020.
- Hollywood in homes: Crowdsourcing data collection for activity understanding. In ECCV, 2016.
- H. Wang and C. Schmid. Action Recognition with Improved Trajectories. ICCV, pages 3551–3558, 2013.
- J. Wang and A. Cherian. Learning discriminative video representations using adversarial perturbations. In ECCV, pages 716–733, 2018.
- L. Wang. Analysis and evaluation of Kinect-based action recognition algorithms. Master’s thesis, The University of Western Australia, Nov 2017.
- L. Wang. Robust Human Action Modelling. PhD thesis, The Australian National University, Nov 2023.
- Videomae v2: Scaling video masked autoencoders with dual masking. In CVPR, pages 14549–14560, June 2023.
- A comparative review of recent kinect-based action recognition algorithms. TIP, 2019.
- Loss switching fusion with similarity search for video classification. ICIP, 2019.
- L. Wang and P. Koniusz. Self-supervising action recognition by statistical moment and subspace descriptors. In ACMMM, 2021.
- L. Wang and P. Koniusz. Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In ACCV, pages 4176–4193, 2022.
- L. Wang and P. Koniusz. Uncertainty-dtw for time series and sequences. In ECCV, pages 176–195. Springer, 2022.
- L. Wang and P. Koniusz. 3mformer: Multi-order multi-mode transformer for skeletal action recognition. In CVPR, pages 5620–5631, 2023.
- Hallucinating IDT descriptors and I3D optical flow features for action recognition with cnns. In ICCV, 2019.
- High-order tensor pooling with attention for action recognition. ICASSP, 2024.
- Actionclip: A new paradigm for video action recognition. CoRR, abs/2109.08472, 2021.
- Deepflow: Large displacement optical flow with deep matching. In ICCV, pages 1385–1392, 2013.
- Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models. In CVPR, 2023.
- A duality based approach for realtime tv-l1 optical flow. In Pattern Recognition, pages 214–223, 2007.
- Vidtr: Video transformer without convolutions. In ICCV, pages 13577–13587, October 2021.
- Lei Wang (975 papers)
- Piotr Koniusz (84 papers)