Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors (2408.11697v1)

Published 21 Aug 2024 in cs.CV

Abstract: 3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during the 3D Gaussian optimization to obtain a clean reconstruction. To this end, we take a self-supervised approach that looks at the image residuals during the optimization to determine areas that have likely been falsified by a distractor. In addition, we leverage a pretrained segmentation network to provide object awareness, enabling more accurate exclusion of distractors. This way, we obtain segmentation masks of distractors to effectively ignore them in the loss formulation. We demonstrate that our approach is robust to various distractors and strongly improves rendering quality on distractor-polluted scenes, improving PSNR by 1.86dB compared to 3D Gaussian Splatting.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel framework that computes dynamic distractor masks via logistic regression on image residuals to improve 3D Gaussian splatting.
It integrates a pretrained segmentation network to refine raw masks for complete object exclusion, ensuring robust handling of dynamic scene elements.
Experiments on the RobustNeRF dataset show a PSNR boost of approximately 1.86dB alongside improved SSIM and LPIPS, validating the method’s effectiveness.

Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors

The paper "Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors" by Paul Ungermann, Armin Ettenhofer, Matthias Nießner, and Barbara Roessle addresses the critical challenge of handling distractors in the context of 3D Gaussian Splatting-based novel view synthesis. The authors provide a comprehensive solution to mitigate the degradation in rendering quality caused by dynamic objects that violate the static scene assumption typically central to such methods.

Background and Motivation

Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting are two prominent techniques for generating photorealistic novel views from a set of input images with known camera poses. While NeRF leverages a multi-layer perceptron (MLP) to represent the scene’s radiance and density distributions, 3D Gaussian Splatting explicitly models the scene as a set of 3D Gaussians. Both approaches aim to minimize a re-rendering loss in RGB space, assuming a static scene where images are photometrically consistent. However, real-world scenarios often feature dynamic elements like moving objects or changing lighting conditions, termed distractors, which can severely impact rendering quality and lead to floating artifacts and blurriness.

Contributions

The authors propose a robust method to identify and ignore distractors during the optimization of 3D Gaussian Splatting, enhancing its capacity to handle imperfect data. The key contributions of this paper include:

Distractor Masks Calculation: A sophisticated neural decision boundary is optimized based on image residuals to dynamically distinguish distractors during the 3D Gaussian optimization process.
Object Awareness Integration: The paper introduces the use of a pretrained segmentation network, SegmentAnything, to augment the distractor masks with object-specific information, ensuring more accurate exclusion of distractors.

Method

The proposed method utilizes image residuals to generate raw distractor masks, which are then processed using local smoothness and contiguous local support heuristics. A logistic regression-based neural classifier learns flexible decision boundaries to differentiate between distractors and non-distractors. These raw masks are refined using segmentations from SegmentAnything, ensuring that entire objects are correctly identified as distractors when appropriate.

Raw Mask Generation

The initial raw masks are derived from centered residuals and are processed to ensure local smoothness and continuity using a box kernel. The authors apply a logistic regression to dynamically classify pixels as distractors, leveraging image residuals aggregated at multiple scales for robustness.

Neural Decision Boundary

The dynamic thresholding process, implemented via logistic regression, enables the method to adaptively distinguish between distractors and static scene content throughout the training process. This iterative refinement ensures that the classification improves as the optimization progresses.

Object Awareness

By intersecting the neural decision boundary-based masks with segmentations from SegmentAnything, the method attains object awareness. This step ensures that the distractor masks encompass complete objects rather than just scattered pixels, significantly improving the accuracy of the masking process.

Experimental Evaluation

The efficacy of the proposed method is demonstrated on the RobustNeRF dataset, which includes several scenes contaminated with diverse distractors. The results indicate substantial improvements in rendering quality over traditional 3D Gaussian Splatting and an adapted version of RobustNeRF.

Quantitative Results

The method improves PSNR by approximately 1.86dB and shows consistent gains in SSIM and LPIPS metrics across various scenes, asserting the robustness and effectiveness of the approach.

Qualitative Results

Visually, the output from the proposed method demonstrates a clearer, more accurate scene reconstruction with reduced artifacts compared to baseline methods. This is especially notable in scenes heavily polluted by distractors.

Implications and Future Work

The improvements in handling distractors have practical implications for applications in virtual reality, gaming, and autonomous systems, where capturing pristine data is often impractical. The ability to robustly synthesize novel views from cluttered scenes opens new avenues for the use of 3D Gaussian Splatting in real-world settings.

Future developments could explore enhanced integration of segmentation methods, refined neural decision boundaries, and extended applications to dynamic scenes where both foreground and background elements may move.

Conclusion

This paper presents a significant advancement in enhancing the robustness of 3D Gaussian Splatting for novel view synthesis by addressing the challenges posed by distractors. Through the introduction of dynamic neural decision boundaries and object-aware mask refinement, the authors provide a robust solution that significantly improves rendering quality even in the presence of dynamic scene elements. This work not only advances the state-of-the-art in novel view synthesis but also broadens the applicability of 3D Gaussian Splatting to more complex and realistic scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1826451952789782551