DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes (2411.11921v1)

Published 18 Nov 2024 in cs.CV

Abstract: We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations. Code is available at \url{https://github.com/chengweialan/DeSiRe-GS}

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a self-supervised framework using a two-stage optimization pipeline with 4D Gaussian splatting for dynamic scene decomposition.
It employs dynamic mask extraction via 3D Gaussian splatting and a Periodic Vibration Gaussian model to separate static and dynamic elements.
The method achieves superior surface reconstruction quality and computational efficiency, demonstrating improved PSNR, SSIM, and LPIPS metrics.

An Overview of DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction

The paper, "DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes," introduces DeSiRe-GS, a novel approach that leverages 4D Gaussian splatting representation for effective scene modeling in autonomous driving contexts. This method is groundbreaking in its ability to handle dynamic scenarios without requiring additional 3D annotations such as bounding boxes.

Technical Summary

DeSiRe-GS offers a self-supervised method that focuses on static-dynamic decomposition and high-quality surface reconstruction within driving scenes, where dynamic objects such as vehicles and pedestrians frequently appear alongside static infrastructure. Unlike previous methods that rely substantially on annotated data, DeSiRe-GS innovatively refrains from such dependencies through a two-stage optimization pipeline.

Stage 1 - Dynamic Mask Extraction:

The method initially utilizes 3D Gaussian Splatting (3DGS) to determine which regions are inherently static in dynamic environments. It exploits the differences between rendered and ground truth images to generate motion masks, thereby identifying areas where the scene undergoes changes over time. This stage is particularly effective due to the deployment of pretrained foundation models to extract image features, followed by a multi-layer perceptron to predict dynamic regions.

Stage 2 - Static-Dynamic Decomposition:

Building on the dynamic masks derived in the first stage, the second stage introduces a Periodic Vibration Gaussian (PVG) model to differentiate static from dynamic Gaussians. In this model, temporal dynamics are incorporated through variables attached to each Gaussian, allowing for effective time-dependent parameterization without over-relying on direct annotation.

Surface Reconstruction Techniques

DeSiRe-GS emphasizes high-fidelity surface reconstruction, crucial for nuanced perception in driving automation. The paper proposes geometric regularizations such as:

Flattening 3D Gaussians: Inspired by recent advances in 2D Gaussian Splatting, the approach flattens 3D ellipsoids into disk-like shapes, optimizing their alignment with object surfaces.
Regulating Gaussian Scale: The model introduces constraints to prevent oversized Gaussians, which previous works have overlooked despite their potential negative impacts on geometric accuracy.
Cross-view Temporal Consistency: The approach aggregates temporal information for static region depth consistency, addressing potential overfitting issues due to view sparsity.

Empirical Evaluation

The paper substantiates the method's efficacy through comprehensive experiments on datasets relevant to real-world autonomous driving, demonstrating that DeSiRe-GS outperforms several existing self-supervised and supervised methods in terms of both image reconstruction quality and computational efficiency. The experiments detail quantitative improvements in metrics such as PSNR, SSIM, and LPIPS, highlighting significant strides in both static-dynamic decomposition and novel view synthesis tasks.

Implications and Future Directions

DeSiRe-GS extends the application horizon of self-supervised learning in dynamic scene reconstruction, potentially reducing the need for annotated data in developing autonomous systems. This has substantial implications for scalable deployment in diverse, real-world environments where manual annotation of all possible elements is impractical.

The paper suggests further examination of integrating more complex dynamic modeling refined from detailed temporal cues and exploring its application beyond driving scenes. Herein lies a field of future research, hinting at promising intersections of Gaussian modeling with other domains of AI that require dynamic environment interaction.

Overall, DeSiRe-GS presents a robust framework that challenges previous norms in the field through innovative optimization pipelines and regularization techniques, positioning itself as a significant contributor to the advancement of scene understanding technologies in autonomous systems.

Related Papers

GitHub

GitHub - chengweialan/DeSiRe-GS (11 stars)

Tweets

https://twitter.com/zhenjun_zhao/status/1859461736329269679

https://twitter.com/jbohnslav/status/1861062335471014339