InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior (2404.11613v1)

Published 17 Apr 2024 in cs.CV

Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces InFusion, which achieves about 20x faster processing by using a diffusion-based depth completion model for 3D Gaussian inpainting.
It employs a learned depth model to precisely initialize 3D points, leading to high-fidelity rendering and facilitating texture and object editing.
The approach combines latent diffusion priors with a progressive inpainting strategy to robustly restore depth maps in complex 3D scenes.

Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

Introduction to InFusion

The paper introduces InFusion, an innovative method that leverages depth completion models informed by diffusion priors to enhance inpainting tasks involving 3D Gaussians. 3D Gaussians have been recognized for their efficiency in novel view synthesis, yet their editability, particularly for inpainting, poses significant challenges. InFusion addresses these by optimizing the initialization of 3D points based on an image-conditioned depth model, resulting in improved rendering quality and processing efficiency in complex scenarios.

Key Contributions

Depth Completion via Diffusion Prior: The proposed model excels at filling in the depth values, maintaining scale alignment with the original depth map. It incorporates training on a large-scale diffusion prior, endowing it with substantial generalizability.
Enhanced Inpainting Performance: InFusion markedly improves the fidelity and efficiency of 3D Gaussian inpainting, demonstrating about 20x faster processing compared to existing methods under various test conditions.
Depth and Texture Flexibility: The approach not only handles depth inpainting effectively but also supports practical applications like user-specific texture modification and novel object insertion within the 3D scene.

Methodology Overview

The method involves guiding the initialization of inpainting points using a tailored depth completion model that learns directly from observed image data. Here’s a breakdown of the process:

Depth Map Restoration: By completing depth maps using learned priors, the model can accurately place initial points for the Gaussians, crucial for high-quality rendering.
Use of Diffusion Models: Employing pre-trained latent diffusion models (LDMs) enables the depth completion model to be both robust and efficient. This approach leverages the strong generative capabilities of LDMs while operating in a lower-dimensional latent space for efficiency.
Progressive Inpainting Strategy: For scenarios involving large occlusions or complex objects, a progressive inpainting approach is adopted, gradually refining the scene using multiple reference views to address challenges dynamically.

Practical Applications and Implications

Texture and Object Editing: Beyond typical inpainting tasks, InFusion allows for user-interactive modifications like texture changes and object insertions, directly benefiting applications in virtual and augmented reality by enhancing user engagement and scene realism.
Efficiency and Scale: The efficiency gains from this approach suggest potential reductions in computational costs and processing times for real-time applications, such as interactive gaming and live AR systems.

Speculative Future Developments

The integration of diffusion model-based depth learning could potentially revolutionize how depth information is utilized in various 3D applications, pushing the boundaries of accuracy and realism in digital content creation. It might also inspire new algorithms for real-time depth sensing and correction in photography and videography, significantly impacting content production technologies.

Concluding Thoughts

InFusion presents a robust methodology for handling the challenges associated with 3D Gaussian inpainting, offering notable improvements in speed and quality. The method’s ability to integrate with diffusion-based priors opens up new paths for enhancing depth information utility in 3D modeling and rendering applications. Future work could explore extending these techniques to other forms of 3D data representation or optimizing the model for even faster real-time processing capabilities.