DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering (2411.19756v2)

Published 29 Nov 2024 in cs.CV and cs.LG

Abstract: Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.

Summary

The paper introduces a method for explicitly decomposing static scene elements from view-specific distractors using Gaussian splatting.
It utilizes photometric consistency and volume rendering to ensure accurate multi-view synthesis without relying on pre-trained semantic models.
Experiments on benchmark datasets show that DeSplat achieves high rendering speed and competitive quality in dynamic 3D scene reconstruction.

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

This paper introduces "DeSplat," a novel approach to novel view synthesis in dynamic 3D environments, which leverages decomposed Gaussian splatting to address the common issue of distractor objects breaking multi-view consistency. Unlike previous existing methods that rely heavily on pre-trained semantic models to handle occlusions and transient objects, DeSplat offers a purely splatting-based solution. This method maintains high rendering speeds akin to Gaussian splatting but provides explicit scene separation into static and dynamic components, achieving competitive results without the need for computational overhead from external models.

The core innovation in DeSplat is its method of explicit decomposition of static scene elements and view-specific distractors solely based on photometric consistency and volume rendering of Gaussian primitives. The paper describes how DeSplat initializes a set of Gaussian primitives within each camera view. These Gaussians are used to model view-specific distractors separately from the static 3D scene during the alpha compositing stages. This explicit separation leverages a composite rendering of both static and distractor elements using Gaussian splatting while ensuring that the reconstructive effort of the framework is focused accurately without being skewed by intermittent scene objects or lighting changes.

The paper reports on experiments performed using three benchmark data sets: RobustNeRF, On-the-go, and Photo Tourism. In these settings, DeSplat effectively manages to capture and render static scene elements with minimal interference from transient distractors, measured against state-of-the-art distractor-free rendering methods. The results demonstrate DeSplat's capacity to handle scenes with varying types and densities of distractors while preserving computational efficiency.

Key findings include DeSplat's ability to achieve high rendering speed and compelling scene separation without substantial loss in rendering quality compared to baseline models that use pre-trained neural networks for feature extraction and optimization. Moreover, it was illustrated that DeSplat is not just a specialized model but rather a general framework that can be integrated seamlessly into existing Gaussian splatting pipelines to enhance their robustness against distractors.

The implications of this research are multifaceted. Practically, DeSplat enhances the utility of Gaussian splatting in applications involving unstructured image collections, like crowd-sourced images or casually captured videos. Theoretically, it shifts the focus from reliance on expansive neural networks towards refining volume rendering techniques, thus opening avenues for future work in lightweight, adaptable models for real-time 3D reconstruction. Future directions could involve further optimizing DeSplat's dynamic to static component recognition and possibly incorporating reinforcement learning techniques to dynamically adjust Gaussian parameters based on scene complexity and occlusion challenges.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/janusch_patas/status/1863476906307232068

https://twitter.com/zhenjun_zhao/status/1863440646951870562

https://twitter.com/arnosolin/status/1932096374494871582