NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

Published 5 Apr 2026 in cs.CV | (2604.04135v1)

Abstract: This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation.

Abstract PDF Upgrade to Chat

Authors (106)

First 10 authors:

Summary

The paper presents robust 3D scene restoration by integrating multi-stage enhancement and physically-informed pipelines.
It details innovative methods such as multi-model fusion, geometry-guided initialization, and adaptive photometric normalization, achieving superior PSNR and SSIM metrics.
The results emphasize the importance of combining 2D enhancement with 3D reconstruction to overcome low-light and smoke-induced degradations.

NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge

Introduction

The NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailed in "NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results" (2604.04135), constitutes a large-scale benchmarking initiative targeting robust 3D scene reconstruction from multi-view imagery captured under severe adverse conditions. The challenge is motivated by the growing need for reliable 3D vision under non-ideal real-world scenarios, such as extreme low-light and smoke environments, which fundamentally challenge neural 3D scene representations like NeRF and 3D Gaussian Splatting (3DGS). Existing reconstruction algorithms, typically evaluated on clean, controlled datasets, fail to generalize robustly to physical degradations prevalent in practical settings. RealX3D, a physically-degraded multi-view dataset, grounds the evaluation and development of resilient 3D restoration pipelines by providing both clean and degraded multi-view pairs spanning diverse adverse conditions.

The 2026 3DRR Challenge encompassed two tracks: 3D Low-light Enhancement and 3D Smoke Restoration. A total of 279 participants registered, and 33 teams submitted valid solutions. The evaluation centered on PSNR and SSIM for novel view synthesis (NVS) against ground-truth clean imagery, with reproducibility enforced by mandating code submission and validation.

Figure 1: Visualizations of clean-degraded image pairs per track, illustrating the severe degradations addressed in the RealX3D benchmark.

Benchmark, Task Design, and Protocol

RealX3D Dataset and Tracks

RealX3D delivers high-fidelity, real-world, paired multi-view scenes imaged in both pristine and degraded (low-light and smoke) conditions. Each challenge track utilizes seven distinct scenes, with ground-truth camera poses supplied. For every scene, participants are given $\sim$ 30 training views (degraded) and $\sim$ 5 NVS targets (evaluation), enabling analysis on few-shot learning, cross-scene generalization, and handling of extreme degradations. Track 1 emphasizes low-light, while Track 2 focuses on smoke/haze contamination.

Challenge Protocol

Methods are evaluated strictly by average PSNR over all NVS targets. For tie-breaking, SSIM is considered. All submissions are verified for reproducibility via code and blind test scenes. The leaderboard covers both quantitative and qualitative performance under real-world, non-synthetic image distributions, ensuring practical applicability of the developed models.

Track 1: 3D Low-light Enhancement

Approaches

Numerous submissions for low-light restoration/3D reconstruction introduced compositional systems that tightly couple image enhancement and physically-constrained 3D optimization. Common architectural elements across top-performing methods include multi-stage enhancement, fusion of diverse priors, adaptive photometric normalization, depth guidance, and controlled color calibration.

FuME-GS (DimV) achieved top PSNR/SSIM via a three-stage pipeline of multi-model 2D restoration, region-adaptive fusion, and fusion-guided 3DGS (Figure 2). Multiple enhancement modules (Retinexformer, Zero-DCE, ReDDiT, HVI-CIDNet) produce candidate restores; adaptive fusion integrates the best local regions. Enhanced photometric consistency is attained by initializing 3DGS from depth maps derived from the low-light views, yielding significant robustness against geometric and color artifacts.

Figure 2: Overview of the fusion-guided multi-stage enhancement pipeline for 3DGS central to the winning FuME-GS approach.

CISP-GS (DLMath_Vision) implements a multi-branch 3DGS system with three independently supervised tracks: analytical dual-supervision, ISP-calibrated appearance anchoring, and frequency-based YCbCr fusion. Final output blends the three via ensemble learning, yielding robust results against illumination noise, chromatic drift, and local detail loss (Figure 3).

Figure 3: Multi-branch framework of CISP-GS combining different enhancement targets and fusion mechanisms for robust low-light 3DGS.

TCIDNet-IBGS introduces a hybrid framework leveraging dual-stream illumination-aware restoration (HVI color space) and geometry-driven multi-view consistency in rendering (Figure 4). Intrinsic constraints, camera-local source aggregation, and adaptive scene-wide calibration are used for photometric consistency and brightness recovery.

Figure 4: The TCIDNet-IBGS pipeline coupling intensity/chroma restoration with geometry-aware splatting for low-light scenarios.

Other strong methods include IDEAL (with view-dependent artifact disentanglement, see Figure 5), NAKA-GS (bionically-motivated chroma correction, Figure 6), and adaptive harmonization systems like GREP-GS (canonical/residual decomposition, Figure 7) and IC-GS (multi-stage, per-channel calibration, Figure 8).

Figure 5: IDEAL architecture with an MLP-based decoder for disentangling spurious illumination and geometry in low-light restoration.

Figure 6: NAKA-GS pipeline fusing physics-prior enhancement, VGGT-based scene geometry, and adaptive point pruning for robust 3DGS.

Figure 7: GREP-GS incorporates global and per-view adapters for decomposition and harmonization in a canonical display space.

Figure 8: Multi-stage enhancement and 2DGS training in the IC-GS pipeline, using multi-model restoration and calibrated color matching.

A diverse array of techniques are explored: domain-adaptive transform learning (Space-GS, Figure 9), hybrid dual-branch engines (ELoG-GS, Figure 10), scene-adaptive tonemapping (AdaTone-GS, Figure 11), and end-to-end differentiable schemes (DarkIR-GS, Figure 12).

Figure 9: Dual-stream and geometry-guided manifold attention in Space-GS for low-light image normalization.

Figure 10: Dual-branch reconstruction/post-enhancement in ELoG-GS for few-shot low-light NVS.

Figure 11: Adaptive pseudo-GT generation and multi-feature 3DGS fusion in AdaTone-GS.

Figure 12: End-to-end enhancement with differentiable rasterization in DarkIR-GS, coupling 2D/3D modules directly.

Quantitative Leaderboard and Key Results

FuME-GS achieved 23.38 dB PSNR / 0.8019 SSIM, outperforming the previous state-of-the-art by a wide numerical margin on this challenging real-world low-light evaluation. CISP-GS, TCIDNet-IBGS, and IDEAL also delivered PSNR >21 dB and SSIM >0.7, robustly surpassing classical two-stage pipelines that use simple gamma correction or basic denoisers upstream of standard 3DGS.

Track 2: 3D Smoke Restoration

Approaches

Solutions for smoke/haze restoration emphasize physics-grounded modeling, MLLM-guided refinement, multi-model fusions, and explicit decomposition of medium/scene interactions. Top entries combine restoration, dehazing, and advanced 3DGS with multi-view ensembling or generative refinement.

GenSmoke-GS (PLBBL) sets a strong benchmark by employing a multi-stage system: ConvIR-UDPNet restoration, DCP-based dehazing, MLLM-guided structure-preserving enhancement, and 3DGS-MCMC reconstruction, with repeated ensembling for variance suppression (Figure 13). Explicit prompts are used to maintain geometric fidelity during MLLM-based view correction.

Figure 13: GenSmoke-GS, a multi-stage pipeline using generative and physical priors for robust multi-view smoke restoration and 3DGS.

Smoke-GS explicitly parameterizes the scattering medium using MLPs guided by spherical harmonics of view rays, modeling spatially and view-dependent smoke color shifts and integrating them with 3DGS renderings (Figure 14).

Figure 14: Medium-aware modeling in Smoke-GS: the Smoke Medium MLP estimates local medium-induced artifacts for image-space correction.

Other leading approaches include MSDG, which executes cascaded dehazing, 3D-UIR rendering, and fusion with fine-tuned models (Figure 15), and hierarchical or hybrid 3D/2D modeling as in DEPHY-GS (Figure 16) and SDG-GS (Figure 17).

Figure 15: Multi-stage pipeline in MSDG: sequential dehazing, 3DGS, diffusion-based repair, and appearance fusion.

Figure 16: DEPHY-GS: staged dehazing, 2D enhancement, and a physics-modeled hybrid 3DGS, with dual-layer Gaussian sets.

Figure 17: Joint learning of clean 3DGS and simplified smoke forward model in SDG-GS. Only clean rendering is exposed at inference.

Quantitative Leaderboard and Observations

GenSmoke-GS attained 20.21 dB PSNR / 0.726 SSIM on Track 2, outperforming other methods by a substantial margin. Physically-constrained methods and those leveraging multi-model ensembles (e.g., MSDG, DiT-IBGS, DEPHY-GS) produced moderate PSNR/SSIM, clearly separated from trivial image-level dehazing pipelines that struggled with geometry and texture recovery.

Analysis of Methodological Trends and Design Principles

Multi-branch and Multi-model Fusion: Top methods integrate multiple enhancement/denoising priors, physically-plausible mappings, and cross-view consistency terms (e.g., FuME-GS, CISP-GS).
Physically Grounded Models: Explicit modeling of imaging formation, e.g., atmospheric light, transmission, chroma correction, or scene-dependent priors, substantially improves robustness to adverse degradations.
Adaptive and Scene-driven Calibration: Scene-adaptive tonemapping, exposure normalization, and color harmonization are repeated patterns, crucial for cross-scene generalization.
Geometry-guided Fusion and Point Cloud Initialization: Many pipelines leverage depth estimation, monocular priors, or robust geometric filtering to initialize and regularize 3DGS optimization under signal-sparse conditions.
Generative and MLLM Assistance: Controlled use of generative models for local refinement, with explicit constraints to avoid geometric hallucinations, is effective for detail recovery in heavy smoke/low-light.

Implications and Future Directions

Practical Takeaways

Challenge results show that even under heavy physical degradations, 3D scene reconstruction is feasible with carefully-engineered, modular, and physically-informed pipelines. Multi-model fusion, aggressive photometric calibration, and learned or physics-based priors are necessary for stability and artifact suppression. Black-box 2D enhancement or naive gamma correction is insufficient; integration and harmonization of diverse cues at both the image and geometry level are essential.

Theoretical Insights

Emerging trends involve explicit joint modeling of degradation and scene, modular architectures combining high-capacity neural restoration and analytic models, and adaptive decomposition of visual signals. The use of canonical/residual decomposition, hybrid 2D/3D modeling, and scene-specific ensemble learning reflects an increasing awareness of disentangling signal sources and noise.

Future Prospects

Key open challenges remain: robust unsupervised learning for adverse conditions without paired priors, scalable joint optimization of 2D/3D parameters, and more principled integration of generative models (MLLMs, diffusion) with physically-constrained reconstruction backbones. The field will benefit from continued development of physically-degraded, high-diversity, real-world datasets such as RealX3D, and from research into cross-degradation generalization, transfer across unseen scenes, and data-efficient restoration methods.

Conclusion

The NTIRE 2026 3DRR Challenge significantly advances the frontier of multi-view 3D scene restoration under harsh physical conditions. Top-performing methods demonstrate that competitive 3DGS, NeRF, and hybrid pipelines can robustly recover geometry and photometric appearance in extreme low-light and smoke scenarios by integrating multi-stage enhancement, physically-modeled priors, adaptive calibration, and geometry-guided optimization. These findings form a foundation for further innovation in adversarial 3D visual restoration and robust, real-world scene understanding.

Reference:

"NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results" (2604.04135)

Markdown Report Issue