NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Published 12 Apr 2026 in cs.CV | (2604.10634v1)

Abstract: This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for training, 407 images for validation, and 593 images for testing. The primary goal of this challenge is to establish a strong and practical benchmark for the removal of raindrops under various illumination and focus conditions. In total, 168 teams have registered for the competition, and 17 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the Raindrop Clarity dataset, demonstrating the growing progress in this challenging task.

Abstract PDF Upgrade to Chat

Authors (98)

First 10 authors:

Summary

The paper presents a novel challenge that removes adherent raindrop artifacts from dual-focused day and night images using advanced transformer and Retinex-based methods.
It introduces a composite scoring mechanism combining PSNR, SSIM, and LPIPS to balance pixel fidelity with perceptual realism.
The study underscores the practical significance of scene-level context and efficiency for deploying restoration models in outdoor imaging applications.

NTIRE 2026 Day and Night Raindrop Removal for Dual-Focused Images: Challenge, Methods, and Outcomes

Motivation and Challenge Design

The NTIRE 2026 Second Challenge targets the removal of adherent raindrop artifacts from dual-focused images captured under both day and night conditions. The technical crux lies in mitigating adherent drops' severe nonlinear degradations while being robust to focus diversity and illumination variance, critical for robust deployment in applications such as ADAS, outdoor vision systems, and surveillance. The employed Raindrop Clarity dataset offers 14,139 real-world training images and substantial day-night, raindrop-/background-focused diversity, effectively superseding synthetic or video-only benchmarks that fail to capture these complexities.

A dual-focused setting reflects practical multi-camera image capture where both raindrop-focused (foreground) and background-focused images for a scene are available, expanding restoration beyond single-style or simulated settings.

Evaluation and Protocol

The challenge mandates model submissions that restore images from the test set, ranked using a composite score:

$\mathrm{Score} = 10 \times \mathrm{PSNR (Y)} + 10 \times \mathrm{SSIM (Y)} - 5 \times \mathrm{LPIPS}$

where PSNR and SSIM are computed on the Y channel from YCbCr, and LPIPS utilizes AlexNet features post-normalization to $[-1, 1]$ . This fuses pixel fidelity and perceptual realism, promoting solutions with balanced reconstruction and natural appearance.

State-of-the-Art Approaches

Submissions (17 final teams) encapsulate modern transformer models, frequency-aware and Retinex-based pipelines, multi-scale fusion, recurrent aggregation, and diffusion-backbone methods, with strong numerical diversity reflected in final scores and model properties.

Leading Solutions

AIIA-Lab: Multi-stage training atop the MSDT transformer, selective checkpoint ensemble, scene-based pseudo-GT fusion, and refinement yielding optimal PSNR/SSIM/L. No external data or heavy test-time augmentation is used.
Figure 1: The pipeline of the method proposed by Team AIIA-Lab.
raingod: Three-stage MSDT framework, median-filter pseudo-supervision, generalization boosted with UAV-Rain1k extra data and multi-scene fusion.
BUU_CV: Parallel STRRNet (rectangular patches for elongated drops) and Restormer (square patches for textures), output fusion, and median artifact suppression balance performance and computational load.
Figure 3: The pipeline of the method proposed by Team BUU_CV.
RetinexDualV2: Physically-grounded dual-branch Retinex decomposition; residual rain masks via UNet guide mask-based multi-head attention. Scene-level mask-based blending achieves competitive perceptual metrics.
Figure 2: The pipeline of the method proposed by Team RetinexDualV2.
ULR: Three-phase UNet—preprocessing for patch restoration, recurrent sequence aggregation at the scene level, and final postprocessing—enabling effective scene context exploitation.
Figure 4: The pipeline of the method proposed by Team ULR.

Design Analyses and Architectures

Test-Time Domain Adaptation: GU-day Mate's AdaIR leverages frequency-aware, curriculum-based transductive fine-tuning using dynamic pseudo-ground-truth constructed per test scene.
Figure 5: The pipeline of the method proposed by Team GU-day Mate.
Diffusion Models: Diffusion-based teams use masked pretraining (MDAE) and score-based fine-tuning, employing scale-recurrent UNet-transformers as backbones, demonstrating the versatility of generative priors.
Domain-specific Expert Routing: Rain-SVNIT and others partition images based on luminance or scene/focus properties, routing to specialized NAFNet or transformer models, thus reducing cross-domain collapse.
Attention and Frequency Modules: Teams such as Cidaut AI augment vanilla UNet/NAFNet structures with dual spatial-frequency attention modules to enhance discrimination of drops versus background.
Figure 6: The pipeline of the method proposed by Team Cidaut AI.
Model Compression and Efficiency: Parameter counts range from ~2M (GU-day Mate, Cidaut AI) to nearly 1B (Just JiT). Model design reflects an explicit trade-off between performance and deployability, with lightweight approaches remaining competitive on PSNR/SSIM.

Quantitative Performance Highlights

Rank	Team	Score	PSNR	SSIM	LPIPS	Params (M)	GFlops
1	AIIA-Lab	35.24	28.34	0.8265	0.2732	16.6	129.9
2	raingod	35.22	28.28	0.8255	0.2636	16.6	129.9
3	BUU_CV	35.04	28.15	0.8222	0.2665	26.89	42.33
4	RetinexDualV2	33.86	27.24	0.8061	0.2887	4.8	301.6
...	...	...	...	...	...	...	...

AIIA-Lab's pipeline leads with the highest composite score, PSNR, and SSIM, but raingod achieves the lowest LPIPS, reflecting a strong focus on perceptual improvement. Lightweight approaches from GU-day Mate (2.14M, 64.2 GFlops) and Cidaut AI (2.95M, 13.26 GFlops) deliver competitive efficiency, clarifying options for real-time deployment.

Theoretical and Practical Implications

The top solutions confirm that scene-level context—via ensemble fusion, mask-based blending, or explicit recurrent aggregation—plays a critical role for robust, focus-agnostic deraining under illumination variation. Physically-grounded priors (e.g., Retinex decomposition, residual rain masking) directly injected into transformer attention mechanisms can boost generalization and perceptual quality.

The challenge further demonstrates that hybrid pipelines leveraging transformer models, pseudo-label self-supervision, and test-time scene-level adaptation, surpass monolithic architectures. Dual-focused data stratification and routing also reduce domain shift, key for hardware-friendly low-parameter deployment.

Several teams' ablation of large external datasets suggests increasing importance on architectural, loss, and pipeline engineering over mere scaling, especially as high-capacity networks saturate on modest-size tasks.

Trends and Future Directions

Scene and Focus Awareness: Explicit scene/focus features drive both structural quality and perceptual restoration, motivating further research into dynamic, hierarchical conditioning and test-time adaptation.
Exploit Generative Priors: Diffusion and generative models, exploited beyond simple denoising to conditional inverse problems, can be tailored with mask-based, attention, and prompt-pool mechanisms for complex restoration.
Perceptual and Efficient Trade-offs: Dual-objective optimization (PSNR/perceptual vs. latency/memory) with adaptive routing and transformer distillation will underpin future edge deployment.
Benchmark Evolution: Highly diverse datastreams (UAV, automotive, night adaptation) and complex degradations will necessitate continued dataset innovation, multi-modal supervision, and unsupervised/self-supervised learning.

Conclusion

The NTIRE 2026 challenge underscores the technical realities of real-world raindrop removal for dual-focused, day-and-night imaging regimes. Transformer-based, scene- and frequency-adaptive pipelines, Retinex-enhanced mask priors, and diffusion-centric architectures dominate, with robust scene/context and pseudo-supervision driving advancement. Results indicate that robust deraining solutions now lie in careful model design, strategic pipeline engineering, and balanced metric optimization, rather than in indiscriminate scaling or external data reliance. This benchmark will serve both as a touchstone and a catalyst for subsequent adverse weather restoration research.

Reference: "NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results" (2604.10634).

Markdown Report Issue