Papers
Topics
Authors
Recent
2000 character limit reached

Lightweight Pyramid Network (LPNet)

Updated 5 January 2026
  • LPNet is a lightweight neural architecture that uses Gaussian–Laplacian pyramid decomposition for scale-adaptive image restoration.
  • It decomposes images into scale-specific subbands, allowing independent processing with recursive residual blocks to drastically reduce parameters.
  • The network extends to depth completion with an inverse pyramid strategy and selective filtering, achieving state-of-the-art accuracy with low computational cost.

A Lightweight Pyramid Network (LPNet) is a neural architecture that leverages multiscale Laplacian pyramid image decomposition to simplify network design, reduce parameter counts, and achieve state-of-the-art performance in image deraining and depth completion tasks. The core principle is to decompose the input into scale-specific subbands using Gaussian–Laplacian pyramids and assign a lightweight sub-network to each scale, enabling highly parameter-efficient, scale-adaptive processing (Fu et al., 2018). For depth completion, an inverse Laplacian pyramid strategy is used, reconstructing output in a coarse-to-fine regime with scale-attentive modules and selective filtering (Wang et al., 11 Feb 2025).

1. Gaussian–Laplacian Pyramid Decomposition and Problem Simplification

LPNet utilizes the classical multiscale framework of Gaussian–Laplacian pyramids to split the input image XRH×W×3X\in \mathbb{R}^{H\times W\times 3} into distinct band-pass and lowpass components:

  • The Gaussian pyramid {G1(X),,GN(X)}\{G_1(X),\ldots,G_N(X)\} recursively downsamples the image at each level nn using a fixed 5-tap Gaussian filter k=[0.0625,0.25,0.375,0.25,0.0625]k=[0.0625,0.25,0.375,0.25,0.0625], computing Gn+1(X)=downsample(Gn(X)k)G_{n+1}(X)=\mathrm{downsample}(G_n(X)*k).
  • The Laplacian pyramid {L1(X),,LN(X)}\{L_1(X),\ldots,L_N(X)\} isolates image details via Ln(X)=Gn(X)upsample(Gn+1(X))L_n(X)=G_n(X)-\mathrm{upsample}(G_{n+1}(X)) for n<Nn<N, with LN(X)=GN(X)L_N(X)=G_N(X).

In LPNet for deraining, each Laplacian level Ln(X)L_n(X) predominantly contains rain streaks and spatial details at the corresponding scale, decoupling the learning task and allowing each sub-network to focus on denoising at a single frequency band (Fu et al., 2018). Inverse Laplacian pyramid decomposition is analogously used in LP-Net for depth completion, predicting the low-frequency structure at the coarsest scale and successively restoring detail at finer scales (Wang et al., 11 Feb 2025).

2. LPNet Architecture for Image Deraining

The image deraining LPNet (Fu et al., 2018) consists of N=5N=5 independent sub-networks, one per Laplacian level, sharing a recursive residual structure with the following design:

  • Each sub-network receives Ln(X)L_n(X) and outputs an estimate of the clean Laplacian Ln(Y)L_n(Y), using a series of T=5T=5 recursive residual blocks.
  • The channel count per level diminishes with increasing coarseness: C1=16C_1=16, C2=8C_2=8, C3=4C_3=4, C4=2C_4=2, C5=1C_5=1.
  • Each block includes convolutional layers, bias terms, and leaky ReLU activations (LReLU(z)=max(z,0.2z)\mathrm{LReLU}(z)=\max(z,0.2z)), with skip connections facilitating fast convergence.
  • The recursion:

Hn,0=LReLU(Wn0Ln(X)+bn0) Fn,t1=LReLU(Wn1Hn,t1+bn1) Fn,t2=LReLU(Wn2Fn,t1+bn2) Fn,t3=Wn3Fn,t2+bn3 Hn,t=LReLU(Fn,t3+Hn,0)\begin{aligned} H_{n,0} &= \mathrm{LReLU}(W_n^0*L_n(X)+b_n^0) \ F_{n,t}^1 &= \mathrm{LReLU}(W_n^1*H_{n,t-1}+b_n^1) \ F_{n,t}^2 &= \mathrm{LReLU}(W_n^2*F_{n,t}^1+b_n^2) \ F_{n,t}^3 &= W_n^3*F_{n,t}^2 + b_n^3 \ H_{n,t} &= \mathrm{LReLU}(F_{n,t}^3+H_{n,0}) \end{aligned}

  • Final output recomposed as Ln(Y)=Wn4Hn,T+bn4+Ln(X)L_n(Y)=W_n^4*H_{n,T}+b_n^4+L_n(X), and Gaussian pyramid reconstruction produces the derained image Y^=G1(Y)\hat{Y}=G_1(Y).

Parameter efficiency is central: the complete LPNet has \sim7.5K parameters (summed across all levels), with as few as 36 parameters at the coarsest level.

3. Loss Functions and Training Protocols

Each Gaussian pyramid output is supervised by a loss combining 1\ell_1 reconstruction across all scales and SSIM on the two finest levels:

L=1Mi=1M[n=1NGn(Yi)Gn(YGTi)1+n=12(1SSIM(Gn(Yi),Gn(YGTi)))]\mathcal{L} = \frac{1}{M} \sum_{i=1}^M \left[ \sum_{n=1}^N \|G_n(Y^i)-G_n(Y_{\mathrm{GT}}^i)\|_1 + \sum_{n=1}^2 \big(1-\mathrm{SSIM}(G_n(Y^i),G_n(Y_{\mathrm{GT}}^i))\big) \right]

No additional regularization or batch normalization is utilized.

Training is performed using patch sampling (1M pairs of 80×8080\times80 patches), Adam optimization (β1=0.9,β2=0.999\beta_1=0.9,\,\beta_2=0.999), a fixed 10310^{-3} learning rate, and three epochs, using publicly available synthetic and real rain datasets.

4. Inverse Laplacian Pyramid in Progressive Depth Completion

LP-Net in depth completion (Wang et al., 11 Feb 2025) adapts the Laplacian pyramid framework for dense depth reconstruction from sparse input. The process is reversed as follows:

  • At the coarsest scale ($1/16$ resolution), a regression head predicts the low-frequency structure.
  • For each finer scale, the lower-resolution prediction is upsampled, fused with the pooled sparse signal via a learned per-scale confidence map, and refined by a Selective Depth Filtering (SDF) module.
  • The SDF module uses two deformable convolution heads (one for smoothing, one for sharpening) followed by an attention mechanism:

Refined depth=aD^m+(1a)D^a\text{Refined depth} = a\odot \hat{D}_m + (1-a)\odot\hat{D}_a

where aa is a learned attention map, D^m\hat{D}_m is the smoothed estimate, and D^a\hat{D}_a the sharpened one.

  • The Multi-Path Feature Pyramid (MFP) module aggregates multi-scale global context by channel-wise split, downsampling, upsampling, and fusion.

Progressive upsampling and injection of band-pass details occur at each decoder stage, analogously to Laplacian pyramid reconstruction but with learned residuals. The pipeline is trained with multi-scale supervision using 1\ell_1 and 2\ell_2 losses, with no need for explicit smoothness or edge-aware regularizers.

5. Performance Analysis and Efficiency

Image Deraining

On standard benchmarks (Rain100H, Rain100L, Rain12), LPNet achieves comparable or superior PSNR/SSIM performance to prior heavy-weight CNNs, while using two orders of magnitude fewer parameters:

Method Params Rain100H Rain100L Rain12
GMM - 15.05/0.43 28.65/0.86 32.02/0.91
SRCNN 20,099 22.84/0.70 29.39/0.91 31.90/0.92
DDN 57,369 21.92/0.76 32.16/0.93 31.76/0.94
JORDER 369,792 26.54/0.83 36.63/0.97 33.92/0.95
LPNet 7,548 23.73/0.81 34.26/0.95 35.35/0.95

User studies confirm LPNet generalizes well to real rain conditions, and runtime analysis reveals a 5–50×\times speed advantage:

  • 1024×\times1024: 0.20 s (LPNet) vs 0.82 s (JORDER) on GTX 1080 GPU.

Depth Completion

On KITTI, NYUv2, and TOFDC, LP-Net achieves top or near-top accuracy (e.g., RMSE = 684.71 mm, MAE = 186.63 mm on KITTI), ranking first on the official leaderboard at time of reporting, and operates with lower latency/memory than recent SOTA models:

Method Params Time (ms) Memory (GB) Benchmark
CFormer 83.5 M 78.3 1.96 KITTI
LRRU 20.8 M 75.0 2.11 KITTI
TPVD 31.2 M 74.3 3.05 KITTI
BP-Net 89.9 M 83.6 6.19 KITTI
LP-Net 29.6 M 63.9 1.76 KITTI

No iterative post-processing or pixel-wise propagation is required.

6. Extensions and Additional Applications

LPNet’s Laplacian pyramid design enables adaptation to other vision problems:

  • Denoising and artifact removal: By substituting noisy/JPEG-corrupted inputs for rainy ones at each scale, LPNet removes additive Gaussian noise and JPEG artifacts using an identical architecture and parameter budget (Fu et al., 2018).
  • Pipeline pre-processing: Deployed as a pre-filter (e.g., before Faster R-CNN for object detection), LPNet restores detection confidence on rainy images with negligible computational overhead (e.g., +0.3 s on a 1024×\times1024 input).
  • Joint derain + dehaze: Training the coarsest sub-network on combined rain and haze inputs enables simultaneous haze and rain removal.
  • Progressive depth completion: The LPNet framework in (Wang et al., 11 Feb 2025) efficiently reconstructs dense depth from sparse points and color, with SDF and MFP modules generalizing the pyramid principle to structured prediction.

A plausible implication is that further exploration of Laplacian pyramid–guided networks could yield lightweight, scalable solutions in both low- and high-level vision tasks, especially where deployment efficiency is critical.

7. Significance, Limitations, and Outlook

LPNet demonstrates that domain-specific priors such as multiscale pyramid decompositions can substantially reduce the depth and complexity of neural networks in several image restoration domains. Its architectures attain SOTA or near-SOTA results with one to two orders of magnitude fewer parameters and lower computational cost, validating the advantages of scale-separated learning via Laplacian pyramids (Fu et al., 2018, Wang et al., 11 Feb 2025).

No evidence is reported of LPNet underperforming on standard benchmarks, but it is not explicitly benchmarked on extremely complex or non-stationary degradations. The lack of batch normalization or weight decay implies robust convergence, but a plausible implication is that further regularization might be warranted for tasks with highly variable non-Gaussian noise.

The transferability of pyramid-guided lightweight networks to robustness-oriented vision tasks and their integration into larger pre-processing or end-to-end pipelines remain active research directions, as suggested by preliminary experiments in object detection and depth completion. The principled use of classical image-processing techniques as inductive biases in neural architectures continues to provide a valuable design paradigm for efficiency-critical vision applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Lightweight Pyramid Network (LPNet).