Papers
Topics
Authors
Recent
Search
2000 character limit reached

OG-PCL: Sparse Radar HAR Architecture

Updated 5 February 2026
  • OG-PCL is an efficient neural architecture that uses tri-view projections and OGConv for processing sparse mmWave radar point clouds in human activity recognition.
  • The system fuses view-wise features through dedicated CNN branches and a shared Bi-LSTM for robust spatial and temporal modeling.
  • Empirical results demonstrate 91.75% accuracy with less than 1M parameters, outperforming conventional 2D/3D CNN methods in real-time deployment.

OG-PCL (Occupancy-Gated Parallel-CNN Bi-LSTM) is an efficient neural architecture designed for robust sparse point cloud processing in human activity recognition (HAR) using millimeter-wave (mmWave) radar. OG-PCL combines a tri-view parallel convolutional backbone with a novel sparsity-aware convolution (OGConv) and temporal modeling via bidirectional LSTM, optimized for lightweight, real-time deployment. The method achieves 91.75% accuracy on the RadHAR dataset, using only 0.83 million parameters and outperforming conventional 2D/3D CNN and point cloud baselines (Yan et al., 12 Nov 2025).

1. Architecture and Data Flow

OG-PCL is optimized for radar-based HAR, where each input sequence consists of T=60T=60 consecutive radar frames, each represented as a voxelized occupancy tensor VtR10×32×32V_t \in \mathbb{R}^{10 \times 32 \times 32} (depth ×\times height ×\times width). The pipeline proceeds through the following stages:

  • Tri-view projection: For each t[1,T]t \in [1,T], three 2D views are generated via spatial max pooling:
    • Top view: Pttop(y,z)=maxxVt(x,y,z)R32×32P_t^{\text{top}}(y, z) = \max_x V_t(x, y, z) \in \mathbb{R}^{32 \times 32}
    • Front view: Ptfront(x,z)=maxyVt(x,y,z)R10×32P_t^{\text{front}}(x, z) = \max_y V_t(x, y, z) \in \mathbb{R}^{10 \times 32}
    • Side view: Ptside(x,y)=maxzVt(x,y,z)R10×32P_t^{\text{side}}(x, y) = \max_z V_t(x, y, z) \in \mathbb{R}^{10 \times 32}
  • Parallel 2D CNN branches: Each view is processed by a dedicated 2D CNN stack comprising a small sequence of OGConv \rightarrow BatchNorm \rightarrow ReLU \rightarrow AvgPool blocks. Kernels and channel widths are heterogeneous and tailored to each view’s spatial resolution.
  • Per-frame feature extraction: For each view vv and frame tt, features are computed as fv(t)=ψv(Ptv,Mtv)Rdvf_v^{(t)} = \psi_v(P_t^v, M_t^v) \in \mathbb{R}^{d_v}, where ψv()\psi_v(\cdot) denotes the OGConv-based CNN, and MtvM_t^v is the corresponding binary occupancy mask.
  • Feature fusion: Concatenates the three view-wise representations: f(t)=concat[ftop(t),ffront(t),fside(t)]Rdf^{(t)} = \text{concat}[f_\text{top}^{(t)}, f_\text{front}^{(t)}, f_\text{side}^{(t)}] \in \mathbb{R}^{d}.
  • Temporal modeling: The sequence {f(t)}t=1T\{f^{(t)}\}_{t=1}^T is fed into a shared Bi-LSTM, whose final hidden state hTh_T undergoes classification: y^=Softmax(WhT+b)\hat{y} = \text{Softmax}(W h_T + b).

This architecture eliminates the need for high-cost 3D convolutions or transformer modules while maintaining strong temporal and spatial modeling capacity (Yan et al., 12 Nov 2025).

2. Occupancy-Gated Convolution (OGConv)

OGConv replaces conventional 2D convolutions in OG-PCL to address the unique challenges of sparse mmWave radar point clouds. It ensures feature statistics remain stable as sparsity patterns change and avoids unnecessary computation over empty regions. The operation proceeds as follows, with XRN×Cin×H×WX \in \mathbb{R}^{N \times C_{\text{in}} \times H \times W} as mini-batch input, M{0,1}N×1×H×WM \in \{0,1\}^{N \times 1 \times H \times W} the occupancy mask, WW the kernel, and K=khkwK = k_h \cdot k_w:

  1. Masked input: Xm=XMX_m = X \odot M
  2. Raw convolution: Yraw=Conv2d(Xm;W)Y_\text{raw} = \text{Conv2d}(X_m; W)
  3. Occupancy count: D=Conv2d(M;1)D = \text{Conv2d}(M; \mathbf{1})
  4. Compensation and gating: For each output location, Y[n,c,p,q]=KD[n,1,p,q]Yraw[n,c,p,q]Y[n, c, p, q] = \frac{K}{D[n, 1, p, q]} \cdot Y_\text{raw}[n, c, p, q] if D[n,1,p,q]>0D[n, 1, p, q] > 0, else $0$
  5. Output mask: Mout=I[D>0]M_\text{out} = \mathbb{I}[D > 0]

The compensation term (K/D)(K/D) corrects feature scale in proportion to the number of nonzero elements in the receptive field, stabilizing output statistics.

1
2
3
4
5
6
7
8
9
10
def OGConv(X, M, W, k_h, k_w):
    X_m = X * M
    Y_raw = conv2d(X_m, W)
    ones_kernel = ones([1,1,k_h,k_w])
    D = conv2d(M, ones_kernel)
    Y = zeros_like(Y_raw)
    mask = (D > 0)
    Y[mask] = Y_raw[mask] * (K / D[mask])
    M_out = mask.astype(float)
    return Y, M_out

3. Parameterization and Efficiency

OG-PCL comprises three lightweight CNN branches (two–three OGConv layers and a small fully connected layer each), matched in width and depth according to their respective spatial resolutions. The design is heterogeneous; for example, the top-view branch for 32×3232 \times 32 inputs uses wider kernels and more channels compared to front/side views with 10×3210 \times 32 inputs. No 3D convolutions or multi-head attention mechanisms are employed. The shared Bi-LSTM (hidden size ≈128, two directions) contributes approximately 0.1M parameters. The complete network contains approximately 0.83 million parameters (Yan et al., 12 Nov 2025).

4. Empirical Validation and Ablations

OG-PCL underwent rigorous empirical ablation to isolate design contributions:

  • Tri-view parallelism: Single-branch baselines (top: 84.37%, front: 87.39%, side: 87.52% accuracy, 0.41M params each) were outperformed by homogeneous 3-branch parallel CNN (PCL: 89.49% accuracy, 0.88M).
  • Heterogeneous design: Tailoring branch architecture to each projection (PCL (Hetero)) increased accuracy marginally to 90.29% (0.77M params).
  • OGConv effect: Introducing OGConv without the (K/D)(K/D) compensation yielded 90.54% accuracy, full OGConv reached 91.75%, demonstrating the necessity of compensation for stabilizing outputs.
  • Comparisons: OG-PCL surpasses TD-CNN+LSTM (86.57%, 0.31M), 3D CNN+LSTM (90.01%, 10.67M), PointNet+LSTM (90.09%, 3.93M), ResNet-18+LSTM (89.15%, 11.86M), and approaches the performance of PCT (Transformer)+LSTM (92.78%, 2.08M) despite using only about ¼ the parameters.
Method Accuracy (%) Parameters (M)
OG-PCL 91.75 0.83
PCT (Transformer)+LSTM 92.78 2.08
3D CNN+LSTM 90.01 10.67
PointNet+LSTM 90.09 3.93
ResNet-18+LSTM 89.15 11.86
TD-CNN+LSTM 86.57 0.31

Performance metrics for OG-PCL on RadHAR: accuracy 91.75%, precision 93.12%, recall 91.87%, F1 91.86%. Inference speed is not explicitly reported, but ≤1M parameter count and use of only 2D convolutions and a small Bi-LSTM enable processing at real-time (30 fps) on edge GPU or CPU (Yan et al., 12 Nov 2025).

5. Design Principles and Theoretical Implications

OG-PCL advances sparse point cloud processing through:

  • Sparsity-aware computation: OGConv’s gating and compensation stabilize activations as input sparsity shifts, which is critical for radar data with highly variable occupancy patterns.
  • Tri-view decomposition: 3D structure is captured with three 2D representations, enabling efficient spatial encoding without the computational expense of 3D convolutions.
  • Efficient parameterization: Heterogeneous branch sizing ensures each projection’s capacity aligns with resolution and salience, optimizing for accuracy under resource constraints.
  • Real-time feasibility: OG-PCL’s lightweight construction (<1M params), absence of heavy 3D modules, and efficient forward path support quantization and pruning for deployment on microcontrollers or mobile SoC platforms. End-to-end frame latency is <10ms on embedded GPUs.

A plausible implication is that this general tri-view, sparsity-compensated paradigm could be adapted to other sparse volumetric domains beyond mmWave radar.

6. Practical Deployment Considerations

OG-PCL was designed for edge scenarios demanding strong privacy and low compute overhead. OGConv avoids wasted operations on empty voxels, further justifying efficient hardware deployment. Heterogeneous, quantization- and pruning-friendly structure ensures adaptability to 8–16 bit inference or microcontroller regimes. The overall system architecture enables sub-10ms frame processing on embedded GPUs, making it favorably suited for real-time privacy-preserving HAR applications (Yan et al., 12 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OG-PCL.