Histogram of Oriented Principal Components for Cross-View Action Recognition (1409.6813v2)

Published 24 Sep 2014 in cs.CV

Abstract: Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods.

Citations (180)

View on Semantic Scholar

Summary

Overview of Histogram of Oriented Principal Components for Cross-View Action Recognition

The paper "Histogram of Oriented Principal Components for Cross-View Action Recognition" by Hossein Rahmani et al. introduces a novel approach to address the challenges of human action recognition across varying viewpoints. Traditional methods focused on depth image-based descriptors are inherently sensitive to viewpoint changes, hence limiting their applicability in uncontrolled environments where camera angles differ significantly.

Technical Contribution

The authors propose a robust descriptor called the Histogram of Oriented Principal Components (HOPC), which directly processes 3D pointclouds rather than depth images. This approach circumvents the intrinsic viewpoint dependence of depth-based features, leveraging the geometric richness of pointcloud data. The HOPC descriptor relies on the eigenvector projections of the principal components within a local spatio-temporal support volume, which are then quantized using a regular dodecahedron. This descriptor effectively captures shape and motion variations and exhibits invariance to noise, scale, and action speed.

The paper also introduces Spatio-Temporal Keypoints (STK) detection, where discriminative keypoints are identified based on their eigenvalue ratios. These keypoints are processed using Local HOPC descriptors, which are view-invariant due to the orientation normalization applied using the eigenvectors of their spatial covariance matrix.

Furthermore, the paper proposes a global descriptor, STK-D, which captures the normalized spatio-temporal distribution of STKs in a four-dimensional space. This descriptor complements the local HOPC by encoding the spatial and temporal relationships among STKs.

Methodology and Evaluation

Experimental evaluation against nine existing techniques on multiple datasets demonstrates the efficacy of the proposed descriptors. On the Northwestern-UCLA Multiview Action3D and UWA3D Multiview Activity II datasets, HOPC achieves superior recognition accuracy, notably higher than methods reliant on depth images or human skeleton data. The experiments show that combining Local HOPC and STK-D descriptors yields the best performance, signifying the advantage of integrating local and global spatio-temporal information.

Implications and Prospects

The proposed approach holds significant implications for practical applications in surveillance, human-computer interaction, and more. It facilitates recognition from arbitrary viewpoints, making it suitable for deployment in dynamic and uncontrolled environments. Additionally, the independence from skeleton data enhances its applicability across diverse scenarios, including partial human visibility or atypical postures where skeleton estimation may be inaccurate.

Future research could explore the adaptation of HOPC for denser 3D pointcloud sequences or its integration with emerging techniques in neural networks for potentially improved performance. Moreover, its application to other domains requiring invariant feature representation, such as robotics and autonomous systems, presents an exciting avenue for advancement.

This paper contributes a meaningful step towards robust cross-view action recognition, advancing the capabilities of pointcloud-based analysis in computer vision.