Papers
Topics
Authors
Recent
Search
2000 character limit reached

AECR-Net: Compact Dehazing Deep Network

Updated 22 January 2026
  • AECR-Net is a compact, end-to-end autoencoder deep network for dehazing that integrates attention, contrastive regularization, adaptive mixup, and dynamic feature enhancement.
  • It leverages a feature-attention backbone with U-Net skip connections and dynamic feature enhancement, achieving state-of-the-art PSNR and SSIM on both synthetic and real-world benchmarks.
  • The model’s efficient design, with only 2.6–3M parameters, enables robust dehazing for adverse visibility conditions and potential applications in other image restoration tasks.

AECR-Net is a compact, end-to-end autoencoder-style deep network designed for single image dehazing and enhancement under challenging visibility conditions such as haze and smoke. The model integrates a feature-attention backbone, contrastive regularization, adaptive mixup, and a dynamic feature enhancement mechanism. AECR-Net has demonstrated state-of-the-art quantitative and qualitative results on both synthetic and real-world haze and smoke benchmarks, with practical utility further validated for gauge image interpretation in adverse visibility applications (Ramírez-Agudelo et al., 15 Jan 2026, Wu et al., 2021).

1. Architectural Components

At its core, AECR-Net employs an autoencoder architecture augmented with various feature fusion and regularization modules:

  • Encoder Stem: Processes an RGB input IinR3×H×WI_{in} \in \mathbb{R}^{3 \times H \times W} through a 7×77 \times 7 convolution (64 channels), ReLU, and InstanceNorm, followed by downsampling (stride-2, 128 channels).
  • Bottleneck: Comprises NN stacked AECR-Blocks, each integrating FFA-Net's Feature Attention (FA) block—composed of Channel-Attention (CA) and Pixel-Attention (PA) sub-modules—with local residuals: Xout=Xin+FA(Xin)X_{out} = X_{in} + FA(X_{in}). Every MM blocks (typically M=2M = 2), attributes a Dynamic Feature Enhancement (DFE) operation to adaptively re-scale channel statistics.
  • Decoder: Mirrors the encoder with upsampling (using nearest-neighbor or transposed convolutions), ReLU, InstanceNorm, and final 3×33 \times 3 convolution projecting to $3$ output channels with sigmoid output activation.
  • Skip Connections: Employs U-Net-style skip connections to link encoder and decoder layers at corresponding spatial resolutions, enhancing feature fusion.
  • Adaptive Mixup Module: During training only, generates synthetic feature-space interpolations (mix-of-pairs) between clean and hazy encodings to augment representation diversity for contrastive learning.

The model contains approximately $2.6$–$3$ million parameters, making it compact relative to other state-of-the-art dehazing networks (Wu et al., 2021).

2. Mathematical Formulation and Losses

AECR-Net’s objective combines image reconstruction fidelity with a pixel-level contrastive constraint:

  • Reconstruction Loss: Per-pixel 1\ell_1 distance between dehazed output Y^\hat{Y} and ground truth YY:

Lrec(Y^,Y)=YY^1L_{rec}(\hat{Y}, Y) = \|Y - \hat{Y}\|_1

  • Contrastive Loss (InfoNCE): Given representations z+,z,z^z^+, z^-, \hat{z} extracted via a pre-trained feature extractor (e.g., VGG-19), AECR-Net enforces:

Lctr(z^i)=logexp(sim(z^i,zi+)/τ)j=1Kexp(sim(z^i,z^j)/τ)L_{ctr}(\hat{z}_i) = -\log \frac{\exp(\text{sim}(\hat{z}_i, z^+_i)/\tau)}{\sum_{j=1}^K \exp(\text{sim}(\hat{z}_i, \hat{z}_j)/\tau)}

where sim(a,b)\text{sim}(a,b) denotes cosine similarity and τ\tau is a temperature hyperparameter.

  • Total Objective: Weighted combination,

Ltotal=Lrec+λLctrL_{total} = L_{rec} + \lambda L_{ctr}

with typical weighting λ=1.0\lambda=1.0 or, in some variants, β=0.1\beta=0.1 (Wu et al., 2021).

Contrastive regularization constrains network outputs to lie closer to ground truth features while diverging from hazy inputs, narrowing the feasible restoration manifold (Wu et al., 2021).

3. Feature Engineering: Attention, Mixup, DFE

  • Feature Attention (FA): Inherits both channel and pixel attention from FFA-Net to emphasize discriminative statistics at multiple representation levels.
  • Adaptive Mixup: Parameterizes skip-level feature fusion via two learned interpolations:

f2=σ(θ1)f1+(1σ(θ1))f1f_{\uparrow 2} = \sigma(\theta_1)f_{\downarrow 1} + (1-\sigma(\theta_1))f_{\uparrow 1}

blending encoder and decoder activations to preserve spatial detail and yield sharper reconstructions.

  • Dynamic Feature Enhancement (DFE): Utilizes two stacked modulated deformable convolutions:

y(p)=k=1Kwk[x(p+pk+Δpk)]mky(p) = \sum_{k=1}^K w_k [x(p + p_k + \Delta p_k)] m_k

where Δpk\Delta p_k are learned offsets, mkm_k are modulation masks, and wkw_k are kernel weights, adaptively expanding the receptive field.

Ablation experiments demonstrate each module’s contribution: DFE alone yields \sim+1.7dB<ahref="https://www.emergentmind.com/topics/peaksignaltonoiseratiopsnr"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">PSNR</a>,adaptivemixupprovides dB <a href="https://www.emergentmind.com/topics/peak-signal-to-noise-ratio-psnr" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">PSNR</a>, adaptive mixup provides \sim+0.2+0.2 dB, and full contrastive regularizer delivers \sim+0.7dBoverpositivesonly(<ahref="/papers/2104.09367"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Wuetal.,2021</a>).</p><h2class=paperheadingid=trainingprotocolsandevaluation>4.TrainingProtocolsandEvaluation</h2><p><strong>Datasets:</strong></p><ul><li>Forgaugeimageenhancement,syntheticdataisgeneratedinUnrealEngine5.1.1withrealisticglobalillumination,custom3Dgaugemeshes,andbothExponentialHeightFogandGPUsimulatedsmokeparticles.Eachsceneyieldsimagesacross10hazeand10smokedensitylevels,withoneclearreference.</li><li>RESIDEbenchmarkisusedforgenericsingleimagedehazing,withSOTS(indoor)testsetforevaluation,andDenseHazeandNHHAZEforrealworldvalidation.</li></ul><p><strong>Hyperparameters:</strong></p><ul><li>Optimizer:Adam dB over positives-only (<a href="/papers/2104.09367" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Wu et al., 2021</a>).</p> <h2 class='paper-heading' id='training-protocols-and-evaluation'>4. Training Protocols and Evaluation</h2> <p><strong>Datasets:</strong></p> <ul> <li>For gauge image enhancement, synthetic data is generated in Unreal Engine 5.1.1 with realistic global illumination, custom 3D gauge meshes, and both Exponential Height Fog and GPU-simulated smoke particles. Each scene yields images across 10 haze and 10 smoke density levels, with one clear reference.</li> <li>RESIDE benchmark is used for generic single-image dehazing, with SOTS (indoor) test set for evaluation, and Dense-Haze and NH-HAZE for real-world validation.</li> </ul> <p><strong>Hyperparameters:</strong></p> <ul> <li>Optimizer: Adam (\beta_1=0.9, \beta_2=0.999, \epsilon=10^{-8})</li><li>InitialLR:</li> <li>Initial LR: 10^{-4}or or 2 \times 10^{-4},steppedorcosineannealing</li><li>Batchsize:, stepped or cosine annealing</li> <li>Batch size: 4(gauge)or (gauge) or 16(RESIDE)</li><li>Epochs: (RESIDE)</li> <li>Epochs: 100</li><li>Lossweights:</li> <li>Loss weights: \lambda=1.0,, \beta=0.1$</li> <li>Feature extraction for contrastive loss layers: VGG-19, layers indexed {1,3,5,9,13} with progressively increasing weights</li> </ul> <p>No data augmentations are applied beyond density-level variation.</p> <h2 class='paper-heading' id='quantitative-performance'>5. Quantitative Performance</h2><div class='overflow-x-auto max-w-full my-4'><table class='table border-collapse w-full' style='table-layout: fixed'><thead><tr> <th>Dataset</th> <th>Method</th> <th>PSNR (dB)</th> <th>SSIM</th> <th>Parameters</th> </tr> </thead><tbody><tr> <td>Haze (Gauge)</td> <td>AECR-Net</td> <td>∼44</td> <td>0.98</td> <td>3M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>∼30</td> <td>0.96</td> <td>-</td> </tr> <tr> <td></td> <td>BCCR</td> <td>∼12</td> <td>0.65</td> <td>-</td> </tr> <tr> <td>Smoke (Gauge)</td> <td>AECR-Net</td> <td>∼37</td> <td>0.96</td> <td>3M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>∼26</td> <td>0.94</td> <td>-</td> </tr> <tr> <td></td> <td>BCCR</td> <td>∼9</td> <td>0.55</td> <td>-</td> </tr> <tr> <td>RESIDE/SOTS</td> <td>AECR-Net</td> <td>37.17</td> <td>0.990</td> <td>2.6M</td> </tr> <tr> <td></td> <td>FFA-Net</td> <td>36.39</td> <td>0.989</td> <td>4.68M</td> </tr> <tr> <td></td> <td>(Others)</td> <td>∼30</td> <td>∼0.97</td> <td>&gt;3M</td> </tr> </tbody></table></div> <p>On both synthetic and real-world datasets, AECR-Net matches or outperforms prior methods in PSNR and SSIM, despite its low model size. In custom smoke and haze gauge datasets, AECR-Net improves PSNR by roughly $+13dBoverFFANet(<ahref="/papers/2601.10537"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">RamıˊrezAgudeloetal.,15Jan2026</a>,<ahref="/papers/2104.09367"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Wuetal.,2021</a>).</p><h2class=paperheadingid=componentlevelanalysisandpracticalimpact>6.ComponentLevelAnalysisandPracticalImpact</h2><ul><li><strong>ContrastiveRegularization</strong>:Adding dB over FFA-Net (<a href="/papers/2601.10537" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Ramírez-Agudelo et al., 15 Jan 2026</a>, <a href="/papers/2104.09367" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Wu et al., 2021</a>).</p> <h2 class='paper-heading' id='component-level-analysis-and-practical-impact'>6. Component-Level Analysis and Practical Impact</h2> <ul> <li><strong>Contrastive Regularization</strong>: Adding L_{ctr}increasesaveragePSNRbyabout increases average PSNR by about +2dB(gaugeenhancement)and dB (gauge enhancement) and +0.7dB(RESIDESOTS).</li><li><strong>DynamicFeatureEnhancement</strong>:Disablingthismoduleleadsto dB (RESIDE SOTS).</li> <li><strong>Dynamic Feature Enhancement</strong>: Disabling this module leads to \sim$0.01 drop in SSIM and $\sim$1.7 dB PSNR loss.

  • Adaptive Mixup: Reduces overfitting to mid-level densities, improving generalization to unseen visibility conditions.
  • FA-Block Depth: Increasing the number of FA blocks to $N=8improvesperformance,withdiminishingreturnsbeyond improves performance, with diminishing returns beyond N>8$.
  • Parameter Efficiency: AECR-Net’s compactness (2.6–3M parameters) provides substantial computational and memory advantages relative to previous deep dehazing models (Wu et al., 2021).
  • In infrastructure and emergency response settings, AECR-Net’s enhanced outputs enable more robust post-processing pipelines for automatic and autonomous gauge interpretation, critical in haze- and smoke-obscured environments (Ramírez-Agudelo et al., 15 Jan 2026).

    7. Availability and Extensions

    The official implementation of AECR-Net is available at https://github.com/GlassyWu/AECR-Net. Contrastive regularization has shown universality: adding CR to alternative single-image dehazing architectures yields consistent PSNR and SSIM gains without increased inference costs, supporting AECR-Net’s utility as a general model design motif.

    A plausible implication is that AECR-Net’s architecture and learning protocol may generalize to deblurring, low-light enhancement, and other single-image restoration tasks where information degradation results from complex, content-dependent imaging perturbations.


    Key References:

    Definition Search Book Streamline Icon: https://streamlinehq.com
    References (2)

    Topic to Video (Beta)

    Whiteboard

    No one has generated a whiteboard explanation for this topic yet.

    Follow Topic

    Get notified by email when new papers are published related to AECR-Net.