Papers
Topics
Authors
Recent
Search
2000 character limit reached

LieNet Architecture

Updated 3 February 2026
  • LieNet architecture is a deep learning framework that integrates Lie group theory and shifted convolutions to capture rotational dynamics in skeleton-based action recognition and image enhancement.
  • Specialized layers like RotMap, RotPooling, and LogMap ensure geometric consistency, robust feature extraction, and implicit temporal alignment on the SO(3) manifold.
  • UltraFast-LieNET extends these principles with dynamic shifted convolutions and multi-scale shifted residual blocks for real-time, low-light image enhancement on resource-constrained devices.

The LieNet architecture refers to a class of deep neural network designs that integrate the mathematical structure of Lie groups or leverage shifted convolutional techniques for efficient representation learning. Prominent representatives include the original LieNet for skeleton-based action recognition, which directly exploits SO(3) manifolds, and the UltraFast-LieNET for lightweight, real-time low-light image enhancement using multi-scale shifted convolutions (Huang et al., 2016, Chen et al., 2 Dec 2025).

1. LieNet for Skeleton-based Action Recognition

LieNet, as presented in "Deep Learning on Lie Groups for Skeleton-based Action Recognition" (Huang et al., 2016), is a deep network architecture specifically constructed to process data that naturally resides on the Lie group SO(3). Its principal application is skeleton-based action recognition, wherein the raw input—a temporal sequence of 3D joint positions—is transformed into a set of rotation matrices expressing relative orientations between bone pairs. Each element Rm,nR_{m,n} or Rn,mR_{n,m} in SO(3)SO(3) encodes the rotation from one bone to another, and the full state at each frame is R(t)=(R1,2(t),R2,1(t),)SO(3)M^R(t) = (R_{1,2}(t), R_{2,1}(t),\ldots) \in SO(3)^{\hat M}, with M^=2CM2\hat M=2\cdot C_M^2 for MM bones.

2. Specialized LieNet Layers and Data Flow

LieNet is structured as a stack of BB identical blocks, each comprising three specialized layers:

  • Rotation Mapping (RotMap) Layer: Performs a left-action by a learned rotation WikSO(3)W^k_i \in SO(3) for each group element, Rik=WikRik1R^k_i = W^k_i R^{k-1}_i. This ensures all intermediate representations remain on SO(3)SO(3), facilitating geometric alignment in the temporal domain.
  • Spatial Rotation Pooling (RotPooling): Reduces redundant features by pooling across each unordered bone pair. For {Rm,n,Rn,m}\{R_{m,n}, R_{n,m}\}, the representative is chosen as the rotation with maximal angle θ(R)=arccos(trace(R)12)\theta(R) = \arccos(\frac{\text{trace}(R)-1}{2}).
  • Temporal Rotation Pooling: Pools across pp consecutive frames for each feature index, again based on maximizing θ(R)\theta(R). After pooling, the sequence length becomes N/p\lceil N/p\rceil.

Following these blocks, a Logarithm Mapping (LogMap) Layer sends each rotation matrix to its skew-symmetric Lie algebra so(3) representation, using the closed-form

log(R)={0if θ(R)=0 θ(R)2sinθ(R)(RRT)otherwise\log(R) = \begin{cases} 0 & \text{if } \theta(R) = 0 \ \dfrac{\theta(R)}{2\sin \theta(R)} (R - R^T) & \text{otherwise} \end{cases}

where the output AiA_i is then vectorized and concatenated for subsequent classification.

3. Input Encoding, Alignment, and Dimensional Control

Input to LieNet starts from raw 3D joint positions, which are converted into relative bone vectors. For each unordered bone pair, axis-angle operations compute Rm,nR_{m,n} and Rn,mR_{n,m}. Sequences are resampled to a constant length NN (e.g., 64 for NTU RGB+D) by uniform sampling; no explicit dynamic time warping is used, as the RotMap layers themselves achieve implicit temporal alignment.

Dimensionality is controlled primarily by pooling:

  • Initial tensor: [N×M^×3×3][N \times \hat M \times 3 \times 3]
  • After spatial pooling: [N×CM2×3×3][N \times C_M^2 \times 3 \times 3]
  • After temporal pooling (stride pp): [N/p×CM2×3×3][\lceil N/p \rceil \times C_M^2 \times 3 \times 3]

For example, with M=24M=24 bones: C242=276C_{24}^2=276 unordered pairs, so 552276552\to276 channels; with N=64N=64, and p=4p=4, final pooled frames are 16.

4. Optimization and Training Procedures

Training employs mini-batch SGD on manifolds (batch size=30), with a fixed learning rate λ=0.01\lambda=0.01 and standard softmax cross-entropy loss over output classes. For the RotMap (SO(3)-constrained) weights, the update sequence is:

  • Compute the Euclidean gradient: E=(L/Rik)(Rik1)T\nabla_E = (\partial L / \partial R^k_i) (R^{k-1}_i)^T
  • Calculate the normal component: B=EWkTWkB = \nabla_E W_k^T W_k
  • Obtain the Riemannian gradient: ~=EB\tilde \nabla = \nabla_E - B
  • Update on the manifold using a retraction: WkRetr(Wkλ~)W_k \leftarrow \text{Retr}(W_k - \lambda \tilde \nabla)

RotPooling and LogMap layers permit typical Euclidean backpropagation. No weight decay or dropout was used.

5. Output Head, Complexity, and Benchmarks

After the LogMap, the vectorized manifold features undergo a fully-connected (FC) transformation, an optional ReLU nonlinearity, and a final FC plus softmax for classification:

  • Output dimension: $D = (\text{# pooled frames}) \cdot C_M^2 \cdot 3$
  • Head: h=W ⁣fcv+bh = W_{\!fc} v + b; y^=softmax(W ⁣outh+bout)ŷ = \text{softmax}(W_{\!out} h + b_{\text{out}})

Benchmark results:

  • G3D-Gaming (20 classes): 89.10% (LieNet-3Blocks)
  • HDM05 (130 classes): 75.78%±2.26% (LieNet-2Blocks)
  • NTU RGB+D (60 classes): Cross-subject 61.37%, Cross-view 66.95%

Typical 3-block configurations require 1.1–1.4 GB memory; CPU epoch/training times vary widely by dataset, with NTU requiring ∼514 min/epoch training, 86 min testing.

6. UltraFast-LieNET for Embedded Low-Light Image Enhancement

UltraFast-LieNET (Chen et al., 2 Dec 2025) is a distinct neural architecture targeting real-time low-light image enhancement for resource-constrained automotive systems. Its structure is dictated by:

  • Dynamic Shifted Convolution (DSConv) Kernel: A 12-parameter channel-wise operation emulating effective 3×3 convolutions with dilation dd, using only $4C$ learnable parameters for CC channels. Operations involve 1×1 group convolutions, zero-padding, spatial shifting, channel-wise summation, gating, and a multiplicative fusion.
  • Multi-Scale Shifted Residual Block (MSRB): kk DSConv instances run in parallel with varying dilation, their outputs summed and added to the block input. MSRB modules are incorporated in both encoder and decoder paths, with downsampling (M_down) and upsampling (M_up with skip connections) variants.

The encoder-decoder topology comprises three downsampling MSRBs, one bottleneck, and three upsampling MSRBs (totaling seven). Global and local residual connections are used throughout. The architecture supports variants: "mini" (36 params) and "max" (180 params), the latter with parameter sharing.

7. Loss Functions, Embedded Performance, and Evaluation

Training employs a composite loss:

Ltotal=αrecLrec+αms-ssimLms-ssim+αgradLgradL_{\text{total}} = \alpha_{\text{rec}} L_{\text{rec}} + \alpha_{\text{ms-ssim}} L_{\text{ms-ssim}} + \alpha_{\text{grad}} L_{\text{grad}}

with αrec=0.975\alpha_{\text{rec}}=0.975, αms-ssim=0.025\alpha_{\text{ms-ssim}}=0.025, αgrad=1\alpha_{\text{grad}}=1. LrecL_{\text{rec}} uses smooth L1 loss, Lms-ssimL_{\text{ms-ssim}} penalizes deviation from MS-SSIM perceptual similarity, and LgradL_{\text{grad}} enforces structural smoothness via Sobel-gradient difference across three decoder outputs.

Benchmark results on LoLI-Street:

  • UltraFast-LieNETmax_{\max}: PSNR=26.51 dB, SSIM≈0.92, LPIPS≈0.13 -- exceeding prior state-of-the-art by approximately 4.6 dB PSNR, using only 180 parameters.
  • Inference on NVIDIA Jetson AGX Orin: 2.69 ms per 600×400 image (max), 1.72 ms for mini.
  • Model size: 0.036–0.18 KB, 2.81–14.04 MFLOPs.

UltraFast-LieNET demonstrates that, via aggressive parameter sharing and efficient receptive-field expansion (DSConv, MSRB), state-of-the-art enhancement is feasible with extreme computational constraints (Chen et al., 2 Dec 2025).


References

  • (Huang et al., 2016) Huang Z, Wang J, Wang L, et al. Deep Learning on Lie Groups for Skeleton-based Action Recognition
  • (Chen et al., 2 Dec 2025) Chen Y, et al. A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LieNet Architecture.