LogMap Layers: Mapping Lie Groups in Deep Learning
- Logarithm Mapping (LogMap) layers are neural modules that convert SO(3) rotation matrices into their Lie algebra (so(3)) via the principal matrix logarithm.
- They linearize manifold-valued features by mapping rotation matrices to Euclidean space, enabling standard deep learning layers for applications like 3D action recognition.
- Robust numerical methods, including safety branches and clamping, ensure stability during both forward and backward passes of the log-mapping process.
A Logarithm Mapping (LogMap) layer is a neural network module designed to map matrix Lie group-valued data, such as tuples of rotation matrices from SO(3), onto their associated Lie algebra, such as so(3), by applying the principal matrix logarithm in closed form. This operation linearizes manifold-valued features into a Euclidean vector space, thereby facilitating the application of conventional deep learning layers for subsequent processing and classification, especially in domains such as skeleton-based action recognition (Huang et al., 2016).
1. Mathematical Basis
Let each input skeleton frame be represented by the tuple
where encodes the number of ordered joint-pair rotations, and denotes the 3D rotation group. The associated Lie algebra is
with each element a real skew-symmetric matrix.
For any , the principal matrix logarithm yields the unique with such that . Operationally, using the axis–angle representation:
- Compute .
- If , set .
- Otherwise, compute
where is skew-symmetric and the scalar ensures . Spectral definitions using are numerically fragile and not used in practice.
2. Layer Structure and Network Integration
Given mini-batch tensors , where each :
- The LogMap layer independently applies the matrix logarithm to each block, yielding , with .
- Each skew-symmetric has three degrees of freedom; in subsequent layers, these may be compacted to a array for standard fully-connected layers, or processed further as matrices in specialized matrix–FC layers.
Within the overall architecture, the canonical block arrangement is:
where LogMap serves as the final "manifold" layer. Its Euclidean outputs are compatible with standard deep network components.
3. Computational Implementation and Gradients
Forward Pass
For each :
- Compute .
- If , set ; otherwise, .
- Values of and are cached for the backward pass.
Backward Pass
Given upstream gradient :
- Compute by the chain rule with respect to the Frobenius inner product, using
where and .
- Using
- The resulting gradient with respect to is:
An alternative, more general formulation for the gradient leverages the Fréchet derivative of the matrix logarithm, expressed as a matrix integral, though in practice the analytic axis–angle version is preferred.
4. Transition to Euclidean Layers
Mapping the manifold-valued features from to linearizes the data by situating it in a vector space of skew-symmetric matrices, thus nullifying the orthogonality and determinant constraints inherent to . This operation enables subsequent layers—ReLU, fully connected, softmax—to treat the features as vectors, bypassing the geometric restrictions of the original manifold. Standard deep learning optimizations and classification techniques can then be applied directly, significantly enhancing flexibility and speed in model training (Huang et al., 2016).
5. Numerical Stability and Implementation
Special attention is given to numerical stability:
- For small , and . A safety branch is implemented: if (e.g., ), then , paralleling the first term in the Taylor expansion of the logarithm.
- The argument of arccos is explicitly clamped to to avoid NaNs from floating-point errors.
- Such measures are essential in any robust log/exp code path.
6. Significance and Context
The integration of LogMap layers addresses the mismatch between non-Euclidean manifold structures arising from action recognition representations and the flat geometry assumed by standard neural network layers. By projecting manifold features into the tangent space, the network can exploit standard deep learning functionality while preserving structural information from original Lie group data. This capability was demonstrated to outperform previous shallow Lie group feature learning and conventional deep learning methods in 3D human action recognition (Huang et al., 2016).