Deep Learning on Lie Groups for Skeleton-based Action Recognition (1612.05877v2)

Published 18 Dec 2016 in cs.CV

Abstract: In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature dimensionality, the architecture is equipped with rotation pooling layers for the elements on the Lie group. Furthermore, we propose a logarithm mapping layer to map the resulting manifold data into a tangent space that facilitates the application of regular output layers for the final classification. Evaluations of the proposed network for standard 3D human action recognition datasets clearly demonstrate its superiority over existing shallow Lie group feature learning methods as well as most conventional deep learning methods.

Citations (270)

View on Semantic Scholar

Summary

The paper introduces LieNet, which integrates Lie group representations into deep neural networks for improved 3D action recognition.
It employs rotation mapping and pooling layers to align features and reduce dimensionality, addressing temporal misalignment issues.
Experimental results on G3D-Gaming, HDM05, and NTU RGB+D datasets demonstrate superior accuracy compared to conventional methods.

Deep Learning on Lie Groups for Skeleton-Based Action Recognition

The paper "Deep Learning on Lie Groups for Skeleton-Based Action Recognition" proposes a novel approach by integrating Lie group structures into deep network architectures for improved 3D human action recognition. The paper addresses the challenges associated with skeleton-based action recognition, notably the issues of temporal misalignment due to speed variations and high feature dimensionality.

The authors present a deep neural network, referred to as LieNet, that directly operates on the Lie group representations of skeletal data. The key innovation lies in the proposal of rotation mapping (RotMap) layers that transform input Lie group features into well-aligned representations suitable for classification. These RotMap layers are complemented by rotation pooling (RotPooling) layers, which reduce feature dimensionality while maintaining the geometric properties of the data. Furthermore, the introduction of logarithm mapping (LogMap) layers facilitates the transition from non-Euclidean Lie group representations to a Euclidean tangent space where standard classification layers can be applied.

The experimental results, derived from evaluations on the G3D-Gaming, HDM05, and NTU RGB+D datasets, demonstrate the superior performance of the proposed LieNet architecture compared to existing methods such as shallow Lie group learning techniques and conventional deep learning approaches. The enhanced accuracy achieved on these benchmarks highlights the effectiveness of LieNet in learning compact, discriminative representations for human action recognition from 3D skeletal data.

Key Findings and Results:

The LieNet architecture consistently outperforms the baseline Lie group methods, achieving accuracy improvements across all considered datasets.
By addressing temporal misalignment implicitly within the network structure, the proposed approach circumvents the drawbacks of computationally expensive methods such as dynamic time warping (DTW).
The use of multiple RotMap and RotPooling layers is shown to enhance performance, though there is an optimal depth beyond which additional layers lead to diminishing returns.
A significant contribution is the demonstration of mapping structured non-Euclidean data onto a Euclidean manifold, enabling compatibility with standard neural network layers.

Implications and Future Directions:

The introduction of LieNet underscores the potential of deep learning frameworks that operate directly on structured geometric data. In advancing this domain, the paper suggests several avenues for future work: the exploration of deeper networks that integrate both spatial and temporal dynamics from raw skeletal data, potential enhancements in rotational mappings and Riemannian computing layers, and extending the approach to other tasks involving non-Euclidean data spaces.

The integration of Lie group theory with neural networks could pave the way for broader applications in computer vision and pattern recognition, where data can be represented as elements of a manifold. The results also encourage the exploration of end-to-end architectures that can directly learn efficient representations from raw data inputs, thereby simplifying preprocessing requirements and enhancing model generality.

In conclusion, the paper makes a substantial contribution to the development of Lie group-based deep learning models for skeleton-based action recognition, offering insights that could inspire further research in manifold learning and its applications in artificial intelligence.

PDF Markdown

Deep Learning on Lie Groups for Skeleton-based Action Recognition (1612.05877v2)

Summary

Deep Learning on Lie Groups for Skeleton-Based Action Recognition

Related Papers