AECR-Net: Compact Dehazing Deep Network
- AECR-Net is a compact, end-to-end autoencoder deep network for dehazing that integrates attention, contrastive regularization, adaptive mixup, and dynamic feature enhancement.
- It leverages a feature-attention backbone with U-Net skip connections and dynamic feature enhancement, achieving state-of-the-art PSNR and SSIM on both synthetic and real-world benchmarks.
- The model’s efficient design, with only 2.6–3M parameters, enables robust dehazing for adverse visibility conditions and potential applications in other image restoration tasks.
AECR-Net is a compact, end-to-end autoencoder-style deep network designed for single image dehazing and enhancement under challenging visibility conditions such as haze and smoke. The model integrates a feature-attention backbone, contrastive regularization, adaptive mixup, and a dynamic feature enhancement mechanism. AECR-Net has demonstrated state-of-the-art quantitative and qualitative results on both synthetic and real-world haze and smoke benchmarks, with practical utility further validated for gauge image interpretation in adverse visibility applications (Ramírez-Agudelo et al., 15 Jan 2026, Wu et al., 2021).
1. Architectural Components
At its core, AECR-Net employs an autoencoder architecture augmented with various feature fusion and regularization modules:
- Encoder Stem: Processes an RGB input through a convolution (64 channels), ReLU, and InstanceNorm, followed by downsampling (stride-2, 128 channels).
- Bottleneck: Comprises stacked AECR-Blocks, each integrating FFA-Net's Feature Attention (FA) block—composed of Channel-Attention (CA) and Pixel-Attention (PA) sub-modules—with local residuals: . Every blocks (typically ), attributes a Dynamic Feature Enhancement (DFE) operation to adaptively re-scale channel statistics.
- Decoder: Mirrors the encoder with upsampling (using nearest-neighbor or transposed convolutions), ReLU, InstanceNorm, and final convolution projecting to $3$ output channels with sigmoid output activation.
- Skip Connections: Employs U-Net-style skip connections to link encoder and decoder layers at corresponding spatial resolutions, enhancing feature fusion.
- Adaptive Mixup Module: During training only, generates synthetic feature-space interpolations (mix-of-pairs) between clean and hazy encodings to augment representation diversity for contrastive learning.
The model contains approximately $2.6$–$3$ million parameters, making it compact relative to other state-of-the-art dehazing networks (Wu et al., 2021).
2. Mathematical Formulation and Losses
AECR-Net’s objective combines image reconstruction fidelity with a pixel-level contrastive constraint:
- Reconstruction Loss: Per-pixel distance between dehazed output and ground truth :
- Contrastive Loss (InfoNCE): Given representations extracted via a pre-trained feature extractor (e.g., VGG-19), AECR-Net enforces:
where denotes cosine similarity and is a temperature hyperparameter.
- Total Objective: Weighted combination,
with typical weighting or, in some variants, (Wu et al., 2021).
Contrastive regularization constrains network outputs to lie closer to ground truth features while diverging from hazy inputs, narrowing the feasible restoration manifold (Wu et al., 2021).
3. Feature Engineering: Attention, Mixup, DFE
- Feature Attention (FA): Inherits both channel and pixel attention from FFA-Net to emphasize discriminative statistics at multiple representation levels.
- Adaptive Mixup: Parameterizes skip-level feature fusion via two learned interpolations:
blending encoder and decoder activations to preserve spatial detail and yield sharper reconstructions.
- Dynamic Feature Enhancement (DFE): Utilizes two stacked modulated deformable convolutions:
where are learned offsets, are modulation masks, and are kernel weights, adaptively expanding the receptive field.
Ablation experiments demonstrate each module’s contribution: DFE alone yields +1.7\sim dB, and full contrastive regularizer delivers +0.7(\beta_1=0.9, \beta_2=0.999, \epsilon=10^{-8})10^{-4}2 \times 10^{-4}416100\lambda=1.0\beta=0.1$</li> <li>Feature extraction for contrastive loss layers: VGG-19, layers indexed {1,3,5,9,13} with progressively increasing weights</li> </ul> <p>No data augmentations are applied beyond density-level variation.</p> <h2 class='paper-heading' id='quantitative-performance'>5. Quantitative Performance</h2><div class='overflow-x-auto max-w-full my-4'><table class='table border-collapse w-full' style='table-layout: fixed'><thead><tr> <th>Dataset</th> <th>Method</th> <th>PSNR (dB)</th> <th>SSIM</th> <th>Parameters</th> </tr> </thead><tbody><tr> <td>Haze (Gauge)</td> <td>AECR-Net</td> <td>∼44</td> <td>0.98</td> <td>3M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>∼30</td> <td>0.96</td> <td>-</td> </tr> <tr> <td></td> <td>BCCR</td> <td>∼12</td> <td>0.65</td> <td>-</td> </tr> <tr> <td>Smoke (Gauge)</td> <td>AECR-Net</td> <td>∼37</td> <td>0.96</td> <td>3M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>∼26</td> <td>0.94</td> <td>-</td> </tr> <tr> <td></td> <td>BCCR</td> <td>∼9</td> <td>0.55</td> <td>-</td> </tr> <tr> <td>RESIDE/SOTS</td> <td>AECR-Net</td> <td>37.17</td> <td>0.990</td> <td>2.6M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>36.39</td> <td>0.989</td> <td>4.68M</td> </tr> <tr> <td></td> <td>(Others)</td> <td>∼30</td> <td>∼0.97</td> <td>>3M</td> </tr> </tbody></table></div> <p>On both synthetic and real-world datasets, AECR-Net matches or outperforms prior methods in PSNR and SSIM, despite its low model size. In custom smoke and haze gauge datasets, AECR-Net improves PSNR by roughly $+13L_{ctr}+2+0.7\sim$0.01 drop in SSIM and $\sim$1.7 dB PSNR loss.
In infrastructure and emergency response settings, AECR-Net’s enhanced outputs enable more robust post-processing pipelines for automatic and autonomous gauge interpretation, critical in haze- and smoke-obscured environments (Ramírez-Agudelo et al., 15 Jan 2026).
7. Availability and Extensions
The official implementation of AECR-Net is available at https://github.com/GlassyWu/AECR-Net. Contrastive regularization has shown universality: adding CR to alternative single-image dehazing architectures yields consistent PSNR and SSIM gains without increased inference costs, supporting AECR-Net’s utility as a general model design motif.
A plausible implication is that AECR-Net’s architecture and learning protocol may generalize to deblurring, low-light enhancement, and other single-image restoration tasks where information degradation results from complex, content-dependent imaging perturbations.
Key References:
- "Enhancing the quality of gauge images captured in smoke and haze scenes through deep learning" (Ramírez-Agudelo et al., 15 Jan 2026)
- "Contrastive Learning for Compact Single Image Dehazing" (Wu et al., 2021)