Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-order Tensor Pooling with Attention for Action Recognition (2110.05216v4)

Published 11 Oct 2021 in cs.CV and cs.LG

Abstract: We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/autocorrelation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic resonance in medicine, 56(2):411–421, 2006.
  2. J. Carreira and A. Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CVPR, pages 1–10, 2018.
  3. Generalized rank pooling for action recognition. In CVPR, 2017.
  4. Higher-order pooling of cnn features via kernel linearization for action recognition. In WACV, 2017.
  5. Non-linear temporal subspace representations for activity recognition. In CVPR, pages 2197–2206, 2018.
  6. G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. Proc. VLDB Endow., 1(2):1530–1541, Aug. 2008.
  7. Visual categorization with bags of keypoints. ECCV Workshop, pages 1–22, 2004.
  8. Human Detection Using Oriented Histogram of Flow and Appearance. ECCV, pages 428–441, 2006.
  9. Spatiotemporal residual networks for video action recognition. In NIPS, pages 3468–3476, 2016.
  10. Temporal residual networks for dynamic scene recognition. In CVPR, 2017.
  11. B. Fernando and S. Gould. Learning end-to-end video classification with rank-pooling. In ICML, volume 48, pages 1187–1196, 2016.
  12. W. T. Freeman and M. Roth. Orientation histograms for hand gesture recognition. Technical Report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, Dec. 1994.
  13. R. Girdhar and D. Ramanan. Attentional pooling for action recognition. In NeurIPS, 2017.
  14. Extrinsic methods for coding and dictionary learning on grassmann manifolds. IJCV, 2015.
  15. A Spatio-Temporal Descriptor Based on 3D-Gradients. BMCV, pages 1–10, 2008.
  16. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
  17. Tensor representations for action recognition. In TPAMI. IEEE, 2020.
  18. Higher-order occurrence pooling for bags-of-words: Visual concept detection. PAMI, 2016.
  19. P. Koniusz and H. Zhang. Power normalizations in fine-grained image, few-shot image and graph classification. In TPAMI. IEEE, 2020.
  20. Hmdb: a large video database for human motion recognition. In ICCV, pages 2556–2563. IEEE, 2011.
  21. A multilinear singular value decomposition. SIAM J. Matrix Analysis and Applications, 21:1253–1278, 2000.
  22. J. R. Magnus. On differentiating eigenvalues and eigenvectors. Econometric Theory, 1985.
  23. Improving the Fisher Kernel for Large-Scale Image Classification. ECCV, pages 143–156, 2010.
  24. Fusing higher-order features in graph neural networks for skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  25. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91–99, 2015.
  26. A database for fine grained activity detection of cooking activities. In CVPR, 2012.
  27. K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, pages 568–576, 2014.
  28. Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV, pages 4489–4497, 2015.
  29. H. Wang and C. Schmid. Action Recognition with Improved Trajectories. ICCV, pages 3551–3558, 2013.
  30. J. Wang and A. Cherian. Learning discriminative video representations using adversarial perturbations. In ECCV, pages 716–733, 2018.
  31. L. Wang. Analysis and evaluation of Kinect-based action recognition algorithms. Master’s thesis, School of the Computer Science and Software Engineering, The University of Western Australia, Nov 2017.
  32. L. Wang. Robust Human Action Modelling. PhD thesis, The Australian National University, Nov 2023.
  33. A comparative review of recent kinect-based action recognition algorithms. TIP, 2019.
  34. Loss switching fusion with similarity search for video classification. ICIP, 2019.
  35. L. Wang and P. Koniusz. Self-supervising action recognition by statistical moment and subspace descriptors. In ACM-MM, pages 4324–4333, 2021.
  36. L. Wang and P. Koniusz. Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In ACCV, pages 4176–4193, 2022.
  37. L. Wang and P. Koniusz. Uncertainty-dtw for time series and sequences. In ECCV, pages 176–195. Springer, 2022.
  38. L. Wang and P. Koniusz. 3mformer: Multi-order multi-mode transformer for skeletal action recognition. In CVPR, pages 5620–5631, 2023.
  39. L. Wang and P. Koniusz. Flow dynamics correction for action recognition. ICASSP, 2024.
  40. Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In ICCV, pages 8697–8707, 2019.
Citations (13)

Summary

We haven't generated a summary for this paper yet.