Lightweight Pyramid Network (LPNet)
- LPNet is a lightweight neural architecture that uses Gaussian–Laplacian pyramid decomposition for scale-adaptive image restoration.
- It decomposes images into scale-specific subbands, allowing independent processing with recursive residual blocks to drastically reduce parameters.
- The network extends to depth completion with an inverse pyramid strategy and selective filtering, achieving state-of-the-art accuracy with low computational cost.
A Lightweight Pyramid Network (LPNet) is a neural architecture that leverages multiscale Laplacian pyramid image decomposition to simplify network design, reduce parameter counts, and achieve state-of-the-art performance in image deraining and depth completion tasks. The core principle is to decompose the input into scale-specific subbands using Gaussian–Laplacian pyramids and assign a lightweight sub-network to each scale, enabling highly parameter-efficient, scale-adaptive processing (Fu et al., 2018). For depth completion, an inverse Laplacian pyramid strategy is used, reconstructing output in a coarse-to-fine regime with scale-attentive modules and selective filtering (Wang et al., 11 Feb 2025).
1. Gaussian–Laplacian Pyramid Decomposition and Problem Simplification
LPNet utilizes the classical multiscale framework of Gaussian–Laplacian pyramids to split the input image into distinct band-pass and lowpass components:
- The Gaussian pyramid recursively downsamples the image at each level using a fixed 5-tap Gaussian filter , computing .
- The Laplacian pyramid isolates image details via for , with .
In LPNet for deraining, each Laplacian level predominantly contains rain streaks and spatial details at the corresponding scale, decoupling the learning task and allowing each sub-network to focus on denoising at a single frequency band (Fu et al., 2018). Inverse Laplacian pyramid decomposition is analogously used in LP-Net for depth completion, predicting the low-frequency structure at the coarsest scale and successively restoring detail at finer scales (Wang et al., 11 Feb 2025).
2. LPNet Architecture for Image Deraining
The image deraining LPNet (Fu et al., 2018) consists of independent sub-networks, one per Laplacian level, sharing a recursive residual structure with the following design:
- Each sub-network receives and outputs an estimate of the clean Laplacian , using a series of recursive residual blocks.
- The channel count per level diminishes with increasing coarseness: , , , , .
- Each block includes convolutional layers, bias terms, and leaky ReLU activations (), with skip connections facilitating fast convergence.
- The recursion:
- Final output recomposed as , and Gaussian pyramid reconstruction produces the derained image .
Parameter efficiency is central: the complete LPNet has 7.5K parameters (summed across all levels), with as few as 36 parameters at the coarsest level.
3. Loss Functions and Training Protocols
Each Gaussian pyramid output is supervised by a loss combining reconstruction across all scales and SSIM on the two finest levels:
No additional regularization or batch normalization is utilized.
Training is performed using patch sampling (1M pairs of patches), Adam optimization (), a fixed learning rate, and three epochs, using publicly available synthetic and real rain datasets.
4. Inverse Laplacian Pyramid in Progressive Depth Completion
LP-Net in depth completion (Wang et al., 11 Feb 2025) adapts the Laplacian pyramid framework for dense depth reconstruction from sparse input. The process is reversed as follows:
- At the coarsest scale ($1/16$ resolution), a regression head predicts the low-frequency structure.
- For each finer scale, the lower-resolution prediction is upsampled, fused with the pooled sparse signal via a learned per-scale confidence map, and refined by a Selective Depth Filtering (SDF) module.
- The SDF module uses two deformable convolution heads (one for smoothing, one for sharpening) followed by an attention mechanism:
where is a learned attention map, is the smoothed estimate, and the sharpened one.
- The Multi-Path Feature Pyramid (MFP) module aggregates multi-scale global context by channel-wise split, downsampling, upsampling, and fusion.
Progressive upsampling and injection of band-pass details occur at each decoder stage, analogously to Laplacian pyramid reconstruction but with learned residuals. The pipeline is trained with multi-scale supervision using and losses, with no need for explicit smoothness or edge-aware regularizers.
5. Performance Analysis and Efficiency
Image Deraining
On standard benchmarks (Rain100H, Rain100L, Rain12), LPNet achieves comparable or superior PSNR/SSIM performance to prior heavy-weight CNNs, while using two orders of magnitude fewer parameters:
| Method | Params | Rain100H | Rain100L | Rain12 |
|---|---|---|---|---|
| GMM | - | 15.05/0.43 | 28.65/0.86 | 32.02/0.91 |
| SRCNN | 20,099 | 22.84/0.70 | 29.39/0.91 | 31.90/0.92 |
| DDN | 57,369 | 21.92/0.76 | 32.16/0.93 | 31.76/0.94 |
| JORDER | 369,792 | 26.54/0.83 | 36.63/0.97 | 33.92/0.95 |
| LPNet | 7,548 | 23.73/0.81 | 34.26/0.95 | 35.35/0.95 |
User studies confirm LPNet generalizes well to real rain conditions, and runtime analysis reveals a 5–50 speed advantage:
- 10241024: 0.20 s (LPNet) vs 0.82 s (JORDER) on GTX 1080 GPU.
Depth Completion
On KITTI, NYUv2, and TOFDC, LP-Net achieves top or near-top accuracy (e.g., RMSE = 684.71 mm, MAE = 186.63 mm on KITTI), ranking first on the official leaderboard at time of reporting, and operates with lower latency/memory than recent SOTA models:
| Method | Params | Time (ms) | Memory (GB) | Benchmark |
|---|---|---|---|---|
| CFormer | 83.5 M | 78.3 | 1.96 | KITTI |
| LRRU | 20.8 M | 75.0 | 2.11 | KITTI |
| TPVD | 31.2 M | 74.3 | 3.05 | KITTI |
| BP-Net | 89.9 M | 83.6 | 6.19 | KITTI |
| LP-Net | 29.6 M | 63.9 | 1.76 | KITTI |
No iterative post-processing or pixel-wise propagation is required.
6. Extensions and Additional Applications
LPNet’s Laplacian pyramid design enables adaptation to other vision problems:
- Denoising and artifact removal: By substituting noisy/JPEG-corrupted inputs for rainy ones at each scale, LPNet removes additive Gaussian noise and JPEG artifacts using an identical architecture and parameter budget (Fu et al., 2018).
- Pipeline pre-processing: Deployed as a pre-filter (e.g., before Faster R-CNN for object detection), LPNet restores detection confidence on rainy images with negligible computational overhead (e.g., +0.3 s on a 10241024 input).
- Joint derain + dehaze: Training the coarsest sub-network on combined rain and haze inputs enables simultaneous haze and rain removal.
- Progressive depth completion: The LPNet framework in (Wang et al., 11 Feb 2025) efficiently reconstructs dense depth from sparse points and color, with SDF and MFP modules generalizing the pyramid principle to structured prediction.
A plausible implication is that further exploration of Laplacian pyramid–guided networks could yield lightweight, scalable solutions in both low- and high-level vision tasks, especially where deployment efficiency is critical.
7. Significance, Limitations, and Outlook
LPNet demonstrates that domain-specific priors such as multiscale pyramid decompositions can substantially reduce the depth and complexity of neural networks in several image restoration domains. Its architectures attain SOTA or near-SOTA results with one to two orders of magnitude fewer parameters and lower computational cost, validating the advantages of scale-separated learning via Laplacian pyramids (Fu et al., 2018, Wang et al., 11 Feb 2025).
No evidence is reported of LPNet underperforming on standard benchmarks, but it is not explicitly benchmarked on extremely complex or non-stationary degradations. The lack of batch normalization or weight decay implies robust convergence, but a plausible implication is that further regularization might be warranted for tasks with highly variable non-Gaussian noise.
The transferability of pyramid-guided lightweight networks to robustness-oriented vision tasks and their integration into larger pre-processing or end-to-end pipelines remain active research directions, as suggested by preliminary experiments in object detection and depth completion. The principled use of classical image-processing techniques as inductive biases in neural architectures continues to provide a valuable design paradigm for efficiency-critical vision applications.