GenSmoke-GS: A Multi-Stage Method for Novel View Synthesis from Smoke-Degraded Images Using a Generative Model

Published 3 Apr 2026 in cs.CV | (2604.03039v1)

Abstract: This paper describes our method for Track 2 of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge on smoke-degraded images. In this task, smoke reduces image visibility and weakens the cross-view consistency required by scene optimization and rendering. We address this problem with a multi-stage pipeline consisting of image restoration, dehazing, MLLM-based enhancement, 3DGS-MCMC optimization, and averaging over repeated runs. The main purpose of the pipeline is to improve visibility before rendering while limiting scene-content changes across input views. Experimental results on the challenge benchmark show improved quantitative performance and better visual quality than the provided baselines. The code is available at https://github.com/plbbl/GenSmoke-GS. Our method achieved a ranking of 1 out of 14 participants in Track 2 of the NTIRE 3DRR Challenge, as reported on the official competition website: https://www.codabench.org/competitions/13993/#/results-tab.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces GenSmoke-GS, a multi-stage pipeline that mitigates smoke degradation to deliver high-fidelity 3D novel view synthesis.
It integrates ConvIR-UDPNet restoration, DCP-based dehazing, MLLM-guided enhancement, and 3D Gaussian Splatting with MCMC optimization to preserve scene geometry.
Experimental results on the RealX3D benchmark show substantial improvements in PSNR, SSIM, and LPIPS compared to established baselines.

GenSmoke-GS: Multi-Stage Novel View Synthesis from Smoke-Degraded Images

Methodological Overview

GenSmoke-GS addresses the task of generating clean, consistent novel views from multi-view smoke-degraded image sets by integrating a multi-stage pipeline consisting of image restoration, dehazing, MLLM-guided enhancement, 3D Gaussian Splatting optimization with MCMC, and ensembling via output averaging. This pipeline is specifically engineered to mitigate the challenges posed by participating media such as smoke, which reduces inter-view consistency and overall visibility, undermining radiance field and Gaussian-based scene optimization approaches.

The process begins with a coarse restoration using ConvIR-UDPNet, followed by DCP-based dehazing to further suppress global smoke artifacts and recover structure as much as possible. Each preprocessed image is then individually enhanced using GPT-Image-1.5, employing prompts to impose strict structural preservation constraints—explicitly to prevent scene geometry drift across views, which generative enhancers otherwise risk inducing. Subsequently, the improved images are processed with the 3DGS-MCMC method, accelerated by FasterGS, for radiance field optimization. Outputs from 91 independent runs are aggregated via averaging, empirically found to suppress local artifacts by leveraging the stochasticity inherent to the MCMC-based process.

Figure 1: Pipeline from smoke degradation to output via restoration, dehazing, MLLM enhancement with structure preservation, 3DGS-MCMC optimization, and multi-run ensemble.

Quantitative and Qualitative Results

The evaluation is conducted on the RealX3D benchmark (NTIRE 3DRR Challenge, Track 2), which presents multiple real-world scenes with severe smoke-induced degradations.

GenSmoke-GS demonstrates strong, consistent improvements in all core synthesis metrics versus competitive baselines—PSNR rises from 11.54 to 20.21, SSIM from 0.597 to 0.729, and LPIPS is reduced from 0.705 to 0.446 relative to standard 3DGS. Notably, these gains are robustly maintained across all tested scenes, with the exception of certain highly challenging instances such as “Shirohana,” where performance, while superior, remains below average, reflecting an upper bound set by extreme degradation or reduced view coverage.

Figure 2: Qualitative comparison for the "Futaba" scene, view 0024, showing improved fidelity and artifact suppression in GenSmoke-GS reconstructions.

Figure 3: Qualitative comparison for the "Shirohana" scene, view 0027, highlighting GenSmoke-GS’s visibility restoration and improved structure over baselines, though some difficult regions persist.

Qualitative assessments indicate that baseline approaches (e.g., unmodified 3DGS, I2-NeRF, SeaSplat) produce reconstructions with strong residual haze, color shifts, geometric inconsistencies, and low-frequency artifacts, whereas GenSmoke-GS yields sharper, more faithful reconstructions aligned to ground truth geometry and radiometry. The multi-run averaging is observed to further suppress local instabilities and "flicker" typical of stochastic inference processes in generative models and MCMC-based field optimizers.

Theoretical and Practical Implications

The strong empirical results presented by GenSmoke-GS demonstrate that multi-stage enhancement optimized for cross-view consistency, rather than per-image fidelity, is crucial for 3D scene synthesis from heavily degraded data. The work refutes the implicit assumption prevalent in much of the literature that aggressive generative enhancements will monotonically benefit downstream view synthesis—showing, by contrast, that enforcement of structure and geometry preservation is necessary to prevent detrimental artifacts in radiance field optimization.

The framework’s use of MLLM-based enhancement, controlled via structure-preserving prompts, foreshadows further integration of large vision-LLMs into classic scene reconstruction pipelines. The pipeline’s averaging mechanism, exploiting statistical independence across repeated optimizations, points toward a simple but effective variational strategy to stabilize generative and stochastic renderings in ill-posed input regimes.

Given the challenging nature of the RealX3D benchmark, these results motivate further research into design principles for generative enhancement that respect scene coherence, and the development of self-consistency regularizers or multi-view-aware enhancement models.

Future Directions

This work is likely to foster new research in:

Cross-view consistent generative enhancement: Explicit learning of multi-view structure-aware enhancement networks.
Optimized prompting strategies for MLLMs in 3D vision: Extension to more expressive prompt engineering and potentially view-aware conditioning.
Probabilistic aggregation frameworks: Further exploration of variational ensemble schemes beyond naïve averaging for uncertainty-aware novel view synthesis.
Extension to other degradation scenarios: Adaptation to underwater, low-light, or multi-modal degraded input domains.

Conclusion

GenSmoke-GS establishes a state-of-the-art approach to novel view synthesis in the presence of severe smoke degradation through a carefully constructed multi-stage restoration and synthesis framework. By prioritizing structural consistency and leveraging the compositional power of MLLMs and advanced 3DGS optimization, the method achieves large, reproducible improvements in both quantitative fidelity and perceptual quality across diverse, challenging scenes. These findings reshape prevailing assumptions about pre-processing pipelines in neural scene reconstruction and open numerous avenues for the principled use of generative models in multi-view 3D vision tasks.

Markdown Report Issue