Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

121 tokens/sec

GPT-4o

9 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

LucentVisionNet: Zero-Shot Low-Light Enhancement

Updated 7 July 2025

LucentVisionNet is a zero-shot learning framework that enhances low-light images by integrating multi-scale spatial attention with deep curve estimation and composite loss.
It utilizes depthwise separable convolutions and iterative residual updates to preserve semantic details while improving computational efficiency.
Empirical results show competitive performance across benchmarks, delivering real-time enhancement for applications like mobile photography and surveillance.

LucentVisionNet is a zero-shot learning framework for low-light image enhancement that advances the field by integrating multi-scale spatial attention with deep curve estimation and an innovative composite loss construction. It is designed to address the persistent difficulties of enhancing images captured in inadequate lighting without access to paired ground-truth data, providing solutions that emphasize perceptual fidelity, semantic preservation, and computational efficiency (2506.18323).

1. Multi-Scale Spatial Attention Mechanism

LucentVisionNet employs a multi-scale spatial attention module to process input images at full, half, and quarter resolutions in parallel. Each scale branch consists of depthwise separable convolutional layers, which facilitate computational efficiency while extracting both fine-grained local details and broad contextual features. After feature extraction, lower-resolution outputs are upsampled and hierarchically fused using additional DSCNN blocks.

The spatial attention mechanism operates on the aggregated features by projecting the fused tensor via a $1 \times 1$ convolution to obtain an intermediate feature map $F$ . A subsequent $1 \times 1$ convolution yields an unnormalized attention map $M$ , which is normalized with a sigmoid function, producing attention map $A$ . The final attended output $O$ results from:

$F = X * W_\text{conv}, \quad M = F * W_\text{att}, \quad A = \sigma(M), \quad O = F \odot A,$

where $*$ denotes convolution and $\odot$ element-wise multiplication. This design adaptively emphasizes spatial regions most critical for perceptual enhancement, especially in degraded low-light areas, while actively mitigating noise amplification.

2. Deep Curve Estimation and Image Adjustment

At the core of LucentVisionNet is a deep curve estimation network structured from stacks of depthwise separable convolutional neural network (DSCNN) layers. Each DSCNN layer, denoted $\text{DWConv}_i$ , applies a $3\times3$ depthwise convolution followed by a $1\times1$ pointwise convolution and a non-linear activation, typically ReLU:

$\text{DWConv}_i(X) = \text{ReLU}((X * K_{\text{depthwise}_i}) * K_{\text{pointwise}_i})$

By sequentially composing these layers across the network and maintaining multi-scale feature propagation, pixel-wise "enhancement curves" are predicted. These non-linear curves, finalized with a Tanh activation, are applied to the input image to deliver outputs with elevated brightness and corrected local and global contrast, all without requiring paired shining examples.

3. Iterative Recurrent Enhancement via Residual Learning

LucentVisionNet refines its output iteratively with a recurrent enhancement strategy founded on residual learning. Instead of single-pass corrections, the network recursively enhances the image through a quadratic residual update:

$X_t = X_{t-1} + D \cdot (X_{t-1}^2 - X_{t-1}),$

where $X_t$ is the image at the $t$ -th iteration and $D$ is a diagonal matrix of learnable enhancement coefficients. This recursive rule improves gradient propagation, addresses vanishing gradients, and preserves structural integrity by learning residual corrections at each stage. This iterative framework ensures that remaining imperfections are progressively mitigated across update intervals.

4. Composite Loss Function for Zero-Shot Learning

The composite loss function in LucentVisionNet enables training in the absence of aligned high-quality ground-truth images. It is structured as a weighted sum of six tailored components:

Loss Component	Objective	Typical Formulation
Total Variation ( $\mathcal{L}_{TV}$ )	Enhancement smoothness	$\sum_{i,j} \lVert A_{i,j+1} - A_{i,j} \rVert^2 + \lVert A_{i+1,j} - A_{i,j} \rVert^2$
Spatial Consistency ( $\mathcal{L}_{spa}$ )	Local structure retention	Gradient matching between input and enhanced images
Color Constancy ( $\mathcal{L}_{color}$ )	Color channel balancing	$\sqrt{(R-G)^2 + (R-B)^2 + (G-B)^2}$
Exposure Control ( $\mathcal{L}_{exp}$ )	Regulated overall brightness	$\lVert \text{AvgPool}(I_{enh}) - E \rVert^2$
Segmentation Guidance ( $\mathcal{L}_{seg}$ )	Semantic content preservation	Penalizes deviations from expected segmentation maps
No-Reference Image Quality ( $\mathcal{L}_{NR}$ )	Human-perceived quality maximization	$100 - E[S(\hat{I})]$ via MUSIQ-AVA model

The full composite objective is:

$\mathcal{L}_{composite} = 1600\,\mathcal{L}_{TV} + \mathcal{L}_{spa} + 5\,\mathcal{L}_{color} + 10\,\mathcal{L}_{exp} + 0.1\,\mathcal{L}_{seg} + 0.1\,\mathcal{L}_{NR}$

Notably, the no-reference image quality loss leverages a pre-trained MUSIQ-AVA model to assign aesthetic scores to enhanced images, encouraging the model to produce outputs with high human-perceived quality in the absence of explicit full-reference signals.

5. Empirical Results and Comparative Performance

LucentVisionNet is evaluated on both paired datasets (LOL, LOL-v2) and unpaired real-world benchmarks (DarkBDD, DICM, VV, NPE, MEF). The model consistently demonstrates superior or highly competitive performance compared to established methods, including Zero-DCE, Zero-DCE++, and Semantic-Guided Zero-Shot Learning. Key metrics employed include PSNR, SSIM, FSIM, VSI for full-reference assessment, and LPIPS, DISTS for perceptual evaluation.

The multi-scale and DSCNN-based architecture enables real-time or near-real-time performance; for high-resolution images, processing times of approximately 1–1.5 seconds are achieved on standard GPU hardware. This computational efficiency is attributed to the predominance of depthwise separable convolutions and the lean multi-scale fusion strategy, facilitating practical deployments where latency and resource constraints are significant factors.

6. Application Scenarios

The zero-shot characteristic and the balance between enhancement quality and computational footprint in LucentVisionNet provide versatility for real-world applications:

Mobile Photography: Automatic low-light enhancement on smartphones, requiring no user intervention or paired training samples.
Surveillance Systems: Clarification and detail enhancement for security footage in nocturnal or poorly illuminated scenes.
Autonomous Navigation: Rendering perceptual input more informative for downstream vision in self-driving vehicles under insufficient exposure.
Medical and Scientific Imaging: Enhancement where illumination control may be infeasible, preserving critical anatomical or specimen structure.

These application domains benefit from the model's semantic preservation, perceptual quality orientation, and real-time readiness, particularly in ambiguous or noisy environments where traditional supervised approaches are either impractical or data-starved.

7. Summary and Significance

LucentVisionNet advances low-light image enhancement by combining multi-scale spatial attention with deep, efficient curve estimation and a carefully balanced composite loss that allows high-quality learning without reference to paired ground truth. Its iterative residual update rule further refines enhancement consistency and semantic integrity. The framework’s empirical results indicate that it reliably improves visual quality, structural cohesion, and perceptual metrics while maintaining operational efficiency suited to deployment in varied imaging pipelines. Collectively, these innovations establish LucentVisionNet as a leading method for real-world, zero-shot low-light image enhancement (2506.18323).

PDF Markdown Chat (Upgrade)

References (1)

A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement (2025)