Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 30 tok/s
GPT-5 High 33 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 483 tok/s Pro
Kimi K2 242 tok/s Pro
2000 character limit reached

GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting (2508.14717v1)

Published 20 Aug 2025 in cs.CV

Abstract: Recent developments in 3D Gaussian Splatting have significantly enhanced novel view synthesis, yet generating high-quality renderings from extreme novel viewpoints or partially observed regions remains challenging. Meanwhile, diffusion models exhibit strong generative capabilities, but their reliance on text prompts and lack of awareness of specific scene information hinder accurate 3D reconstruction tasks. To address these limitations, we introduce GSFix3D, a novel framework that improves the visual fidelity in under-constrained regions by distilling prior knowledge from diffusion models into 3D representations, while preserving consistency with observed scene details. At its core is GSFixer, a latent diffusion model obtained via our customized fine-tuning protocol that can leverage both mesh and 3D Gaussians to adapt pretrained generative models to a variety of environments and artifact types from different reconstruction methods, enabling robust novel view repair for unseen camera poses. Moreover, we propose a random mask augmentation strategy that empowers GSFixer to plausibly inpaint missing regions. Experiments on challenging benchmarks demonstrate that our GSFix3D and GSFixer achieve state-of-the-art performance, requiring only minimal scene-specific fine-tuning on captured data. Real-world test further confirms its resilience to potential pose errors. Our code and data will be made publicly available. Project page: https://gsfix3d.github.io.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces GSFix3D, a novel approach that integrates diffusion-based repair with dual-conditioning from mesh and 3DGS renderings to remove artifacts and inpaint missing regions.
  • It employs a fine-tuned latent diffusion model with customized dual-input conditioning and random mask augmentation, achieving up to 5 dB PSNR improvement in challenging scenarios.
  • The approach is efficient and robust, requiring minimal scene-specific fine-tuning while effectively handling real-world challenges such as pose errors and sensor noise.

Diffusion-Guided Repair of Novel Views in Gaussian Splatting: GSFix3D

Introduction and Motivation

GSFix3D addresses a persistent challenge in 3D Gaussian Splatting (3DGS): the generation of artifact-free, photorealistic renderings from novel viewpoints, especially in regions with sparse observations or incomplete geometry. While 3DGS offers explicit, differentiable scene representations and fast rendering, its reliance on dense input views leads to visible artifacts—holes, floaters, and unnatural surfaces—when extrapolating to under-constrained regions. Diffusion models, notably latent-space denoising frameworks such as Stable Diffusion, have demonstrated strong generative capabilities in 2D image synthesis but lack spatial consistency and scene awareness required for 3D reconstruction tasks.

GSFix3D proposes a hybrid pipeline that leverages the generative priors of diffusion models to repair and inpaint novel views rendered from 3DGS and mesh reconstructions. The core innovation is the GSFixer module—a fine-tuned latent diffusion model conditioned on both mesh and 3DGS renderings—enabling robust removal of artifacts and plausible completion of missing regions. The repaired images are then distilled back into the 3DGS representation via photometric optimization, improving the underlying 3D scene fidelity. Figure 1

Figure 1: System overview of the proposed GSFix3D framework for novel view repair. Given initial 3D reconstructions in the form of mesh and 3DGS, we render novel views and use them as conditional inputs to GSFixer. Through a reverse diffusion process, GSFixer generates repaired images with artifacts removed and missing regions inpainted. These outputs are then distilled back into 3D by optimizing the 3DGS representation using photometric loss.

Methodology

Customized Fine-Tuning Protocol

GSFixer is obtained by fine-tuning a pretrained latent diffusion model (Stable Diffusion v2) on scene-specific data. The conditional generation task is formulated as learning p(IgtImesh,Igs)p(I^{gt}|I^{mesh}, I^{gs}), where IgtI^{gt} is the ground truth RGB image, ImeshI^{mesh} is the mesh-rendered image, and IgsI^{gs} is the 3DGS-rendered image. The dual-conditioning strategy exploits the complementary strengths of mesh (coherent geometry) and 3DGS (photorealistic appearance), providing richer cues for artifact removal and inpainting.

The network architecture repurposes the U-Net backbone of the diffusion model, expanding the input layer to accommodate concatenated latent codes from both mesh and 3DGS renderings. Training follows the DDPM objective, minimizing the L2L_2 loss between predicted and true noise in the latent space. Figure 2

Figure 2: Illustration of the customized fine-tuning protocol for adapting a pretrained diffusion model into GSFixer, enabling it to handle diverse artifact types and missing regions.

Data Augmentation for Inpainting

To address the lack of missing regions in training data, a random mask augmentation strategy is introduced. Semantic masks derived from real-image annotations are randomly applied to mesh and 3DGS renderings, simulating occlusions and under-constrained regions. Gaussian blur is used to approximate soft boundaries typical in 3DGS artifacts. This augmentation is critical for enabling GSFixer to generalize to large holes and occlusions in novel views.

Inference and 3D Distillation

At inference, GSFixer receives mesh and 3DGS renderings from novel viewpoints, encodes them into the latent space, and iteratively denoises the target latent using a DDIM schedule. The output is a repaired image with artifacts removed and missing regions inpainted. These images are then used to optimize the 3DGS parameters via a photometric loss combining L1L_1 and SSIM terms, with adaptive density control to fill previously empty regions. Multi-view constraints are enforced by augmenting the training set with repaired views and poses, further improving global coherence.

Experimental Results

Quantitative and Qualitative Performance

GSFix3D and GSFixer are evaluated on ScanNet++ (real-world indoor) and Replica (synthetic indoor) datasets, using PSNR, SSIM, and LPIPS metrics. Across all baselines—including DIFIX, DIFIX-ref, SplaTAM, RTG-SLAM, and GSFusion—GSFixer consistently achieves superior artifact removal and inpainting, with up to 5 dB PSNR gain over DIFIX variants in challenging scenarios. The dual-input configuration (mesh+3DGS) further boosts performance, especially in under-constrained regions. Figure 3

Figure 3: Qualitative comparisons of diffusion-based repair methods on the ScanNet++ and Replica datasets. All examples use only 3DGS reconstructions as the input source. Our GSFixer effectively removes artifacts and fills in large holes, where both DIFIX and DIFIX-ref fail to produce satisfactory results.

Figure 4

Figure 4: Qualitative comparison between GSFixer and GSFix3D on the ScanNet++ and Replica datasets. Both mesh and 3DGS reconstructions from GSFusion are used as input sources. The 2D visual improvements from GSFixer are effectively distilled into the 3D space by GSFix3D.

Real-World Robustness

GSFix3D demonstrates resilience to pose errors and sensor noise in real-world data, as shown in ship and outdoor scenes. The method effectively removes shadow-like floaters and fills missing regions, even when initial reconstructions are degraded by inaccurate poses or incomplete measurements. Figure 5

Figure 5: Novel view repair on self-collected ship data. Our method is robust to pose errors, effectively removing shadow-like floaters.

Ablation Studies

Ablations confirm the importance of dual-input conditioning and random mask augmentation. The dual-input setup consistently outperforms single-input variants, leveraging complementary information to resolve artifacts. Mask augmentation is essential for strong inpainting performance, enabling GSFixer to fill large missing regions with coherent textures. Figure 6

Figure 6: Qualitative ablation of input image conditions on the ScanNet++ and Replica datasets. We compare GSFixer results using three types of inputs rendered from GSFusion: mesh-only, 3DGS-only, and dual-input. The artifacts (highlighted by green and yellow boxes) present in the single-input settings are effectively mitigated with the dual-input configuration.

Figure 7

Figure 7: Qualitative ablation of random mask augmentation on the Replica dataset. We compare GSFixer results fine-tuned with and without our proposed augmentation strategy. The differences in inpainting quality highlight the improved ability to fill large missing regions when augmentation is used.

Implementation Considerations

  • Computational Requirements: GSFixer fine-tuning is efficient, requiring only a few hours on a single consumer GPU (RTX 4500 Ada, 24GB VRAM). DIFIX-finetune demands significantly more memory (A40, 48GB VRAM).
  • Data Efficiency: Only minimal scene-specific fine-tuning is needed; no large-scale curated real image pairs are required.
  • Scalability: The pipeline is compatible with various 3DGS-based SLAM systems and can be extended to other explicit representations.
  • Deployment: The plug-and-play nature of GSFixer allows integration into existing 3D reconstruction workflows, with the repaired images directly improving downstream tasks such as AR/VR visualization and robotics navigation.

Implications and Future Directions

GSFix3D demonstrates that diffusion models, when properly fine-tuned and conditioned, can be effectively harnessed for spatially consistent repair and inpainting in 3D reconstruction pipelines. The dual-input design and mask augmentation are critical for generalization and robustness. The approach is practical for real-world deployment, requiring only modest computational resources and minimal data curation.

Future work may explore:

  • Extending the framework to unbounded outdoor scenes and dynamic environments.
  • Incorporating temporal consistency for video-based novel view repair.
  • Adapting the pipeline for multi-modal sensor fusion (e.g., LiDAR, thermal).
  • Investigating end-to-end differentiable training for joint 3DGS and diffusion model optimization.

Conclusion

GSFix3D establishes a robust, efficient pipeline for repairing novel views in 3DGS reconstructions, leveraging diffusion priors via a customized fine-tuning protocol and dual-input conditioning. The method achieves state-of-the-art artifact removal and inpainting, adapts to diverse scenes and reconstruction pipelines, and is resilient to real-world data imperfections. Its practical efficiency and adaptability make it a valuable contribution to the field of 3D scene reconstruction and novel view synthesis.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube