- The paper presents a physics-guided pseudo-clean 3DGS approach that decouples geometry recovery from appearance harmonization for superior smoke removal.
- It employs a refined Dark Channel Prior and guided filtering to generate pseudo-clean targets that ensure accurate geometry with sharp detail.
- Empirical results on the RealX3D benchmark show a PSNR improvement of +3.68 dB, along with enhanced structural fidelity and reduced smoke-induced artifacts.
Detailed Technical Essay on "SmokeGS-R: Physics-Guided Pseudo-Clean 3DGS for Real-World Multi-View Smoke Restoration" (2604.05301)
Smoke fundamentally degrades multi-view scene capture through attenuation of radiance, additive airlight, and spatially inconsistent, depth-varying view-dependent perturbations. Existing neural rendering pipelinesโincluding those equipped with explicit scattering modelsโexhibit significant limitations on real-world smoke scenes, as evidenced by the RealX3D benchmark. The primary failure mode arises from an entanglement of geometry inference and medium appearance, where methods either overfit to smoke appearance at the expense of geometric fidelity or, conversely, recover structure but inherit strong color or contrast biases.
The NTIRE 2026 3D Restoration and Reconstruction Track 2 challenge explicitly targets this task: removing smoke from multi-view observations for high-fidelity, sharp, and geometrically robust novel view synthesis under physically plausible conditions.
Methodological Framework
The key insight of SmokeGS-R is a strict decoupling of geometry recovery from appearance harmonization, leveraging physics-based pseudo-clean supervision, a geometry-first 3DGS source branch, and controlled LAB-space harmonization from a donor ensemble.
Figure 1: Overview of SmokeGS-R with DCP-based pseudo-clean supervision, a 3DGS geometry branch, reference aggregation, and LAB harmonization finalized by Gaussian smoothing.
Physics-Guided Pseudo-Clean Generation
Pseudo-clean targets are synthesized from smoky inputs via a refined Dark Channel Prior (DCP) and guided filtering pipeline. DCP estimates transmission and global airlight from local RGB minimums, enhancing initial transmission maps. Guided filtering refines these maps, ensuring spatial smoothness and edge preservation. Pseudo-clean images are reconstructed by atmospheric inversion and gamma enhancement for improved contrast. Importantly, these serve only as robust geometric anchors, not as absolute ground-truth.
Geometry-First 3DGS Source Model
A clean-only 3D Gaussian Splatting (3DGS) source model is trained exclusively on the refined pseudo-clean images, with photometric supervision combining L1 and SSIM losses and purposefully excluding heavy appearance-oriented regularization or adversarial priors. This minimizes artifact propagation from smoky rendering and ensures accurate geometry and structure recovery.
Donor Ensemble for Appearance Priors
A parallel set of four donor branches is trained, each specializing in different priors (ensemble-spatial, dual-depth, VGGT-based, etc.), constructed to act solely as appearance statistics poolsโnever influencing geometry. Their diversity captures a broad range of appearance statistics under varying smoke densities and chromatic biases, supplying robust appearance references without introducing geometric artifacts.
LAB-Space Multi-Reference Harmonization
At inference, the geometry-first source is rendered and then harmonized by aggregating donor outputs through geometric-mean reference construction, operating per-pixel in log space for robust amalgamation. LAB-space Reinhard transfer is then performed: for each channel (L, a, b), means and standard deviations of the source and ensemble reference are matched, effectively shifting the color distribution to remove smoke-induced bias while preserving high-frequency structure. Final outputs are smoothly post-processed with low-variance Gaussian blurring to suppress residual splatting artifacts.
Empirical Results and Analysis
On the RealX3D smoke benchmark, SmokeGS-R delivers a PSNR of 15.209, SSIM of 0.644, and LPIPS of 0.551 on publicly released test scenesโsubstantially outperforming the best official baseline (plain 3DGS at 11.530 PSNR) by +3.68 dB PSNR. The method also consistently surpasses physically-motivated neural rendering pipelines such as SeaThru-NeRF, Watersplatting, and I2-NeRF, all of which demonstrate strongly scene-dependent and often suboptimal performance on real, as opposed to synthetic, smoke.
Figure 2: Scene-wise PSNR disaggregated across released challenge scenes, showing robust improvement by SmokeGS-R over baselines.
Qualitative evaluations (Figure 3) further support the numerical outcomes. When compared with the strongest official baselines, SmokeGS-R maintains sharper object contours, more faithful backgrounds, and substantially greater removal of residual veiling and airlight, especially in scenes with high smoke density. Failures in the baselines manifest as either oversmoothed detail or persistent color cast and haze, which SmokeGS-R mitigates through its separate harmonization step.
Figure 3: Qualitative comparison of rendered test views among reference, 3DGS, SeaThru-NeRF, and SmokeGS-R; SmokeGS-R demonstrates superior veil removal and geometry preservation.
Design Rationale and Ablation Insights
Empirical challenge evidence showed that further entangling physics-based smoke modeling (e.g., via explicit internal radiance field decomposition) decreased pipeline stability, leading to distorted geometry, hallucinated regions, or unreliable appearance transfer. The pipelineโs division into strong physics-based geometry supervision and decoupled appearance correction enabled robust transfer to public test data with no retraining and high reproducibility. The spatially-varying, non-uniform nature of real smoke highlighted by RealX3D makes such decoupling particularly advantageous versus monolithic, learn-everything approaches.
Theoretical and Practical Implications
The work rigorously demonstrates thatโcontrary to many recent trendsโphysically detailed internal models are not uniformly beneficial in real-world participating media reconstruction. Instead, careful modularization, with domain priors guiding geometry and appearance correction conducted post hoc, can deliver stronger, more stable results. The method's strong generalization to public benchmarks suggests that a geometry-first paradigm with controlled test-time harmonization is highly practical for real-world deployment, where retraining and heavy model engineering is infeasible.
Future Research Directions
Future extensions may explore:
- Adaptive harmonization: Scene-aware or patch-wise harmonization instead of global LAB transfer, potentially using local statistics or learned blending;
- Generalization to complex participating media: Application and re-benchmarking on underwater, fog, and other multi-modal settings;
- End-to-end differentiable harmonization: Incorporation of differentiable statistics or cross-modal consistency losses into harmonization for further performance gains;
- Scalability and real-time deployment: Acceleration, quantization, and optimization for real-world robotic and immersive capture applications.
Conclusion
SmokeGS-R establishes that geometry-first, physics-guided supervision coupled with modular, stable appearance harmonization constitutes an effective and reproducible pipeline for real-world multi-view smoke restoration. Its strong outperformance over the best official RealX3D baselines and successful cross-protocol reproducibility underline the value of strategic decoupling over monolithic modeling in challenging, real-world degradation settings. The approach sets a clear benchmark for future advances, especially concerning modularity, scene-adaptivity, and practical deployability.