OG-PCL: Sparse Radar HAR Architecture
- OG-PCL is an efficient neural architecture that uses tri-view projections and OGConv for processing sparse mmWave radar point clouds in human activity recognition.
- The system fuses view-wise features through dedicated CNN branches and a shared Bi-LSTM for robust spatial and temporal modeling.
- Empirical results demonstrate 91.75% accuracy with less than 1M parameters, outperforming conventional 2D/3D CNN methods in real-time deployment.
OG-PCL (Occupancy-Gated Parallel-CNN Bi-LSTM) is an efficient neural architecture designed for robust sparse point cloud processing in human activity recognition (HAR) using millimeter-wave (mmWave) radar. OG-PCL combines a tri-view parallel convolutional backbone with a novel sparsity-aware convolution (OGConv) and temporal modeling via bidirectional LSTM, optimized for lightweight, real-time deployment. The method achieves 91.75% accuracy on the RadHAR dataset, using only 0.83 million parameters and outperforming conventional 2D/3D CNN and point cloud baselines (Yan et al., 12 Nov 2025).
1. Architecture and Data Flow
OG-PCL is optimized for radar-based HAR, where each input sequence consists of consecutive radar frames, each represented as a voxelized occupancy tensor (depth height width). The pipeline proceeds through the following stages:
- Tri-view projection: For each , three 2D views are generated via spatial max pooling:
- Top view:
- Front view:
- Side view:
- Parallel 2D CNN branches: Each view is processed by a dedicated 2D CNN stack comprising a small sequence of OGConv BatchNorm ReLU AvgPool blocks. Kernels and channel widths are heterogeneous and tailored to each view’s spatial resolution.
- Per-frame feature extraction: For each view and frame , features are computed as , where denotes the OGConv-based CNN, and is the corresponding binary occupancy mask.
- Feature fusion: Concatenates the three view-wise representations: .
- Temporal modeling: The sequence is fed into a shared Bi-LSTM, whose final hidden state undergoes classification: .
This architecture eliminates the need for high-cost 3D convolutions or transformer modules while maintaining strong temporal and spatial modeling capacity (Yan et al., 12 Nov 2025).
2. Occupancy-Gated Convolution (OGConv)
OGConv replaces conventional 2D convolutions in OG-PCL to address the unique challenges of sparse mmWave radar point clouds. It ensures feature statistics remain stable as sparsity patterns change and avoids unnecessary computation over empty regions. The operation proceeds as follows, with as mini-batch input, the occupancy mask, the kernel, and :
- Masked input:
- Raw convolution:
- Occupancy count:
- Compensation and gating: For each output location, if , else $0$
- Output mask:
The compensation term corrects feature scale in proportion to the number of nonzero elements in the receptive field, stabilizing output statistics.
1 2 3 4 5 6 7 8 9 10 |
def OGConv(X, M, W, k_h, k_w): X_m = X * M Y_raw = conv2d(X_m, W) ones_kernel = ones([1,1,k_h,k_w]) D = conv2d(M, ones_kernel) Y = zeros_like(Y_raw) mask = (D > 0) Y[mask] = Y_raw[mask] * (K / D[mask]) M_out = mask.astype(float) return Y, M_out |
3. Parameterization and Efficiency
OG-PCL comprises three lightweight CNN branches (two–three OGConv layers and a small fully connected layer each), matched in width and depth according to their respective spatial resolutions. The design is heterogeneous; for example, the top-view branch for inputs uses wider kernels and more channels compared to front/side views with inputs. No 3D convolutions or multi-head attention mechanisms are employed. The shared Bi-LSTM (hidden size ≈128, two directions) contributes approximately 0.1M parameters. The complete network contains approximately 0.83 million parameters (Yan et al., 12 Nov 2025).
4. Empirical Validation and Ablations
OG-PCL underwent rigorous empirical ablation to isolate design contributions:
- Tri-view parallelism: Single-branch baselines (top: 84.37%, front: 87.39%, side: 87.52% accuracy, 0.41M params each) were outperformed by homogeneous 3-branch parallel CNN (PCL: 89.49% accuracy, 0.88M).
- Heterogeneous design: Tailoring branch architecture to each projection (PCL (Hetero)) increased accuracy marginally to 90.29% (0.77M params).
- OGConv effect: Introducing OGConv without the compensation yielded 90.54% accuracy, full OGConv reached 91.75%, demonstrating the necessity of compensation for stabilizing outputs.
- Comparisons: OG-PCL surpasses TD-CNN+LSTM (86.57%, 0.31M), 3D CNN+LSTM (90.01%, 10.67M), PointNet+LSTM (90.09%, 3.93M), ResNet-18+LSTM (89.15%, 11.86M), and approaches the performance of PCT (Transformer)+LSTM (92.78%, 2.08M) despite using only about ¼ the parameters.
| Method | Accuracy (%) | Parameters (M) |
|---|---|---|
| OG-PCL | 91.75 | 0.83 |
| PCT (Transformer)+LSTM | 92.78 | 2.08 |
| 3D CNN+LSTM | 90.01 | 10.67 |
| PointNet+LSTM | 90.09 | 3.93 |
| ResNet-18+LSTM | 89.15 | 11.86 |
| TD-CNN+LSTM | 86.57 | 0.31 |
Performance metrics for OG-PCL on RadHAR: accuracy 91.75%, precision 93.12%, recall 91.87%, F1 91.86%. Inference speed is not explicitly reported, but ≤1M parameter count and use of only 2D convolutions and a small Bi-LSTM enable processing at real-time (30 fps) on edge GPU or CPU (Yan et al., 12 Nov 2025).
5. Design Principles and Theoretical Implications
OG-PCL advances sparse point cloud processing through:
- Sparsity-aware computation: OGConv’s gating and compensation stabilize activations as input sparsity shifts, which is critical for radar data with highly variable occupancy patterns.
- Tri-view decomposition: 3D structure is captured with three 2D representations, enabling efficient spatial encoding without the computational expense of 3D convolutions.
- Efficient parameterization: Heterogeneous branch sizing ensures each projection’s capacity aligns with resolution and salience, optimizing for accuracy under resource constraints.
- Real-time feasibility: OG-PCL’s lightweight construction (<1M params), absence of heavy 3D modules, and efficient forward path support quantization and pruning for deployment on microcontrollers or mobile SoC platforms. End-to-end frame latency is <10ms on embedded GPUs.
A plausible implication is that this general tri-view, sparsity-compensated paradigm could be adapted to other sparse volumetric domains beyond mmWave radar.
6. Practical Deployment Considerations
OG-PCL was designed for edge scenarios demanding strong privacy and low compute overhead. OGConv avoids wasted operations on empty voxels, further justifying efficient hardware deployment. Heterogeneous, quantization- and pruning-friendly structure ensures adaptability to 8–16 bit inference or microcontroller regimes. The overall system architecture enables sub-10ms frame processing on embedded GPUs, making it favorably suited for real-time privacy-preserving HAR applications (Yan et al., 12 Nov 2025).