LoopExpose: Unsupervised Exposure Correction

Updated 12 November 2025

LoopExpose is an unsupervised framework for exposure correction that employs a nested-loop strategy and luminance ranking loss to enhance images captured under varied lighting conditions.
It jointly optimizes a correction network and evolving pseudo-labels from multi-exposure fusion, bridging the gap with supervised methods.
Experimental results show LoopExpose outperforms traditional unsupervised approaches in both single-exposure correction and multi-exposure fusion tasks.

LoopExpose is an unsupervised framework for arbitrary-length exposure correction, which introduces a nested-loop optimization strategy to achieve high-fidelity enhancement of images captured under varied lighting conditions. By sidestepping the need for paired supervision, LoopExpose jointly optimizes a correction network and its training targets—pseudo-labels derived from multi-exposure fusion—via iterative feedback. A central innovation is the integration of a luminance ranking loss, enforcing physically plausible luminance ordering within input sequences. Experimental evidence across major benchmarks demonstrates that LoopExpose consistently outperforms existing unsupervised exposure correction methods and narrows the performance gap to supervised approaches.

1. Problem Formulation and Motivation

The exposure correction task is defined over a sequence of $N$ images $I = \{I_1, I_2, \ldots, I_N\}$ of a single scene, each captured at a distinct exposure value (EV). The objective is to produce a set of corrected outputs $E_i$ that accurately approximate the scene radiance as if captured under canonical, well-exposed conditions. Two primary variants are recognized:

Single-Exposure Correction (SEC): Where $N = 1$ , each input requires independent correction.
Multi-Exposure Fusion (MEF): Where $N > 1$ , inputs are fused to produce a single high-quality image.

Supervised approaches to exposure correction are hamstrung by the difficulty and subjectivity inherent in assembling large datasets of paired inputs and ground truth. Exposure errors—modeled as $I_i^{obs} = f(E^{real} \cdot \Delta t_i)$ with $\Delta t_i$ the exposure time and $f(\cdot)$ the camera response—do not admit straightforward self-supervised learning, as they are structured and nonzero-mean. Classical exposure fusion algorithms, notably the method of Mertens et al., demonstrate that plausible fused reference images can be generated directly from the input sequences, providing a foundation for pseudo-supervised approaches.

The optimization target for LoopExpose is articulated as:

$\operatorname{minimize}_\theta \; L_{total}(\theta; I, Y, E)$

where $G_\theta$ is the correction model, $Y$ is the current pseudo-label (fused image), and $E = G_\theta(I)$ is the corrected output. Crucially, $Y$ evolves throughout training, producing a nested optimization scheme.

2. Nested-Loop Optimization Strategy

LoopExpose implements a two-level loop, alternating between the refinement of pseudo-labels via fusion and network training against these labels. The process operates as follows:

Lower Level (Pseudo-Label Generation): Uses the fixed Mertens fusion operator $\mathcal{M}(\cdot)$ $M (\cdot)$ to create a reference image.
- Warm-Up Phase ( $t < T_0$ ): $Y^{(t+1)} = \mathcal{M}(I)$ (fusion of raw exposures).
- Joint Optimization Phase ( $t \geq T_0$ ): $Y^{(t+1)} = \mathcal{M}(I \cup E^{(t)})$ (fusion of inputs and latest corrected outputs).
Upper Level (Correction Model Update): For fixed pseudo-labels $Y^{(t)}$ , minimize $L_{total}$ with respect to $\theta$ by updating the correction network.

This iterative scheme is formalized as:

$E^{(t)} = G_{\theta^{(t)}}(I)$
$Y^{(t+1)} = \mathcal{M}(I)$ (if $t<T_0$ ) or $\mathcal{M}(I\cup E^{(t)})$
$\theta^{(t+1)} \leftarrow \theta^{(t)} - \eta \nabla_\theta L_{total}(\theta^{(t)}; I, Y^{(t+1)})$

This design produces a self-reinforcing loop in which improved correction results drive the formation of more informative pseudo-labels, in turn enabling further network refinement.

3. Luminance Ranking Loss

In exposure correction, luminance serves as a dominant cue. LoopExpose incorporates a luminance ranking loss to enforce that, within a multi-exposure sequence ordered by apparent brightness, the network’s global luminance predictions mirror this relational structure.

A luminance-aware network branch outputs a per-frame scalar $F_i^L$ for each input $I_i$ . After sorting the sequence from darkest to brightest, the loss is expressed as:

$L_{lumi} = w_{lumi} \sum_{i<j} \max(0, F_i^L + margin - F_j^L)$

with $margin=0$ and $w_{lumi}=1$ . This imposes a soft constraint that $F_i^L < F_j^L$ whenever $I_i$ is darker than $I_j$ . The luminance ranking loss is aggregated with the reconstruction loss in the full training objective.

4. Loss Function and Optimization Objective

The composite loss leveraged for model optimization is:

$L_{total} = L_{sup} + L_{lumi}$

where the pseudo-supervised loss $L_{sup}$ for each frame and its pseudo-label is:

$L_{sup} = L_1(E, Y) + w_p L_p(E, Y) + w_{ssim} L_{ssim}(E, Y)$

with standard settings $w_p=0.1$ , $w_{ssim}=0.05$ .

$L_1(E, Y)$ : pixel-wise $L_1$ distance
$L_p(E, Y)$ : perceptual loss via VGG-like feature embeddings
$L_{ssim}(E, Y) = 1 - \mathrm{SSIM}(E, Y)$ : structural similarity loss

The algorithmic workflow can be succinctly described as follows:

Y = Mertens_fusion(I)
initialize theta

for t in range(T_max):
    E = correction_network(I, theta)
    if t < T0:
        Y = Mertens_fusion(I)
    else:
        Y = Mertens_fusion(concat(I, E))
    L_sup = L1(E, Y) + 0.1 * L_p(E, Y) + 0.05 * L_ssim(E, Y)
    L_lumi = sum_{i<j} max(0, F_i^L - F_j^L)
    L_total = L_sup + L_lumi
    theta = theta - eta * grad(L_total)

5. Network Architecture and Implementation

The exposure correction network $G_\theta$ is constructed in a dual-branch configuration:

Luminance-Aware Encoder-Decoder:
- U-Net-like, extracts multi-scale luminance features.
- Applies global average pooling and fully connected layers to produce the luminance descriptor $F_i^L$ used for ranking.
- Feature maps support subsequent fusion stages.
Adaptive 3D Look-Up Table (3D-LUT) Module:
- Comprises $K$ pre-trained 3D LUTs whose outputs are adaptively blended via weights inferred from input stability.
Attention-Based Fusion:
- Combines outputs from the luminance-aware and 3D-LUT branches using channel and spatial attention mechanisms.

Training is performed using PyTorch on an NVIDIA RTX 4090. The optimizer is Adam $(\beta_1, \beta_2) = (0.9, 0.999)$ . A 5-epoch warm-up (initial learning rate decaying from $10^{-4}$ to $10^{-5}$ ) precedes a joint optimization phase with fixed learning rate $5 \times 10^{-5}$ for 20–30 epochs. Batch size is 8 sequences per GPU; augmentation includes random cropping ( $256 \times 256$ ) and horizontal/vertical flips.

Key datasets:

SeqMSEC: From MSEC, 5 images/scene, EV $\in [-1.5, +1.5]$ .
SeqRadio / PlainRadio: From Radiometry512, 7 images/scene, EV $\in [-3, +3]$ .
Training samples one sequence per batch; inference supports arbitrary sequence lengths.

6. Experimental Evaluation

6.1. Quantitative Performance

On MSEC:

SEC ( $N=1$ ): LoopExpose achieves PSNR $\approx$ 20.59 dB, SSIM $\approx$ 0.833. This surpasses all previous unsupervised methods (max 18.62 dB/0.807) and approaches supervised SOTA (21.82 dB/0.850).
MEF ( $N=5$ ): Fusion over corrected $\{E_i\}$ yields PSNR $\approx$ 21.32 dB, SSIM $\approx$ 0.847; exceeding both Mertens fusion (19.41 dB/0.825) and deep MEF baselines.

On Radiometry512:

SEC: PSNR $\approx$ 21.53 dB, SSIM $\approx$ 0.821 (UEC baseline: 18.89 dB/0.779, supervised CoTF: 23.99 dB/0.861).
MEF: PSNR $\approx$ 23.22 dB, SSIM $\approx$ 0.889 (Mertens: 21.66 dB/0.842).

6.2. Qualitative Outcomes

LoopExpose consistently corrects under- and over-exposed regions without introducing artificial color casts. Textures and structures are preserved, with highlights maintaining detail rather than exhibiting saturation. Compared to alternatives like LACT and CoTF, outputs often display more neutral tone mapping, especially in highly saturated areas.

6.3. Ablation Studies

Incorporating $L_{lumi}$ improves SEC PSNR from 20.72 to 21.01 dB and MEF PSNR from 21.11 to 21.32 dB.
Two-stage nested optimization is superior:
- Warm-up only: SEC 20.36 dB, MEF 20.83 dB.
- Joint only: SEC 20.83 dB, MEF 21.21 dB.
- Full two-stage: SEC 21.01 dB, MEF 21.32 dB.

7. Significance and Implications

LoopExpose substantiates that a nested combination of rule-based exposure fusion and data-driven correction, regulated by pseudo-supervision and luminance ranking, yields state-of-the-art unsupervised exposure correction. The methodology achieves significant quantitative improvements and robust qualitative enhancement without ground-truth pairs, demonstrating the viability of unsupervised learning in this traditionally supervised domain. The framework generalizes to varying sequence lengths, is readily optimized using standard hardware and open tools, and can serve as a baseline for future investigation of self-supervised image enhancement strategies.

A plausible implication is that this nested-loop pseudo-labeling paradigm may be transferable to other ill-posed image restoration tasks where ground-truth acquisition is a bottleneck, provided the existence of robust, rule-based reference signal generators.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to LoopExpose.