ERIENet: Efficient RAW Image Enhancement
- The paper demonstrates that ERIENet achieves real-time low-light RAW image enhancement using a fully-parallel, multi-scale feature extraction strategy.
- The model integrates a channel-aware residual dense block and green channel guidance to optimize feature reuse and adaptive normalization.
- Quantitative tests show ERIENet outperforms SOTA methods with reduced FLOPs, lower parameter count, and competitive PSNR and SSIM metrics.
The Efficient RAW Image Enhancement Network (ERIENet) is an architecture specifically designed for enhancing RAW images captured in low-light environments. ERIENet introduces a parallel multi-scale feature extraction scheme and leverages the inherent properties of RAW Bayer data—particularly the green channel dominance—to achieve superior quality and computational efficiency compared to previous methods. It achieves real-time performance on high-resolution images with significantly reduced FLOPs and parameter count relative to state-of-the-art (SOTA) approaches while delivering high-fidelity enhancement results (Wang et al., 17 Dec 2025).
1. Parallel Multi-Scale Architecture
ERIENet eschews traditional sequential multi-scale encoders in favor of a fully-parallel, multi-scale feature extraction and fusion architecture. An input RAW Bayer image is reshaped ("packed") as into an RGGB 4-channel representation. Three parallel branches process at downsampled resolutions corresponding to scale factors . Each branch uses a depth-wise-separable convolution () to extract , reducing spatial footprint and computational cost.
Feature extraction in each branch is conducted independently using stacked Channel-Aware Residual Dense Blocks (CRDB, see Section 2). At the lowest resolution (scale ), features are modulated by a learnable channel-wise global mask.
Progressive fusion is performed as follows:
This approach significantly reduces the depth and latency of any single branch, concentrates heavy computation at the coarsest resolution, and allows effective multi-scale context aggregation.
2. Channel-Aware Residual Dense Block (CRDB)
The CRDB module extends the classic Residual Dense Block (RDB) by appending an Efficient Channel Attention (ECA) mechanism. For CRDB-n (n convolutions):
- Input passes through sequential Conv–ReLU units to yield .
- Intermediate representations are concatenated to .
- Feature fusion uses a convolution: .
- Channel recalibration: (with ), where denotes global average pooling.
- Output: .
- Final output: via residual addition.
This design provides improved feature reuse, lightweight channel attention, and reduced FLOPs relative to vanilla RDBs.
3. Green Channel Guidance Branch
ERIENet explicitly capitalizes on the green channel's predominance in Bayer patterns. The Green Channel Guidance (GCG) branch operates only at the coarsest scale ():
- Extract the two green channels, , from .
- Compute scale () and shift () parameters using two convolutions: and .
- For each input , compute SAN (Spatially-Adaptive Normalization):
- SAN is integrated by replacing conventional BatchNorm in CRDB at , such that the green channel-derived illumination guides adaptive normalization of feature maps.
Ablation studies confirm that including GCG with BatchNorm yields pronounced improvements in PSNR, SSIM, and LPIPS compared to variants omitting GCG or using LayerNorm.
4. Loss Functions and Training Paradigm
ERIENet is trained end-to-end with a composite objective function:
- L1 loss:
- Wavelet SSIM loss:
using three-level 2D Haar DWT, across subbands .
- Wavelet MSE loss:
The total loss is with . Training is conducted for 500 epochs using Adam optimizer with , , on patches with augmentation, using the SID (Sony Bayer) and ELD datasets, and sRGB ground truth images obtained with Rawpy.
5. Computational Efficiency and Comparative Performance
ERIENet demonstrates significant improvements in computational metrics while providing SOTA enhancement quality. On an RTX 3090 (24 GB), it achieves inference on -resolution RAW images at 146 FPS, with 6.84 ms per image, 39.29 GFLOPs, and 1.419M parameters.
Performance comparison (SID [9] test subset):
| Method | PSNR | SSIM | FLOPs | Params | FPS(4K) |
|---|---|---|---|---|---|
| SID [9] | 28.62 | 0.798 | 523.83G | 7.761M | N/A |
| RAWFormer | 29.22 | 0.790 | 781.54G | 3.401M | N/A |
| SMG | 30.17 | 0.834 | 1274.27G | 18.355M | N/A |
| ERIENet | 29.12 | 0.797 | 39.29G | 1.419M | 146.2 |
Ablation studies attribute ERIENet's performance to (a) its fully parallel multi-scale strategy, (b) the use of GCG, and (c) the CRDB module. The full three-branch design yields the largest performance gain relative to single- or two-branch variants, while GCG and CRDB offer substantial incremental improvements in all perceptual metrics.
6. Quantitative and Qualitative Evaluation
Quantitative metrics on SID (Sony Bayer subset):
| Method | Time (ms) | GFLOPs | Params (M) | PSNR (dB) | SSIM | LPIPS |
|---|---|---|---|---|---|---|
| RAWFormer | 20.69 | 781.54 | 3.401 | 29.22 | 0.790 | 0.258 |
| SMG | 7.72 | 1274.27 | 18.355 | 30.17 | 0.834 | 0.238 |
| Dnf | 2874.44 | 11.140 | 0.797 | 30.62 | 0.797 | 0.343 |
| ERIENet | 6.84 | 39.29 | 1.419 | 29.12 | 0.797 | 0.259 |
Qualitative assessment indicates reduced artifacts in highlights, improved dark-region detail fidelity, and perceptually better outputs compared to competing methods.
7. Implications and Future Extensions
ERIENet validates the hypothesis that fully-parallel, multi-scale feature extraction—augmented with green channel guidance and efficient dense block modules—can yield both real-time inferencing capability and enhanced reconstruction accuracy for low-light RAW images at resolution. Potential extensions include adapting the architecture for video (exploiting temporal structure for further denoising or enhancement), integrating enhancement with downstream recognition tasks, and exploring dynamic scale weighting or neural architecture search to maximize efficiency on edge hardware (Wang et al., 17 Dec 2025).