EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis (2503.20168v1)

Published 26 Mar 2025 in cs.CV

Abstract: Novel view synthesis of urban scenes is essential for autonomous driving-related applications.Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner. Unlike existing feed-forward, pixel-aligned 3DGS methods, which often suffer from issues like multi-view inconsistencies and duplicated content, our approach predicts 3D Gaussians across multiple frames within a unified volume using a 3D convolutional network. This is achieved by initializing 3D Gaussians with noisy depth predictions, and then refining their geometric properties in 3D space and predicting color based on 2D textures. Our model also handles distant views and the sky with a flexible hemisphere background model. This enables us to perform fast, feed-forward reconstruction while achieving real-time rendering. Experimental evaluations on the KITTI-360 and Waymo datasets show that our method achieves state-of-the-art quality compared to existing feed-forward 3DGS- and NeRF-based methods.

Summary

The paper proposes EVolSplat, a novel volume-based Gaussian Splatting method that shifts from pixel-aligned approaches to predict 3D Gaussians globally using a 3D CNN, addressing multi-view inconsistencies in urban scenes.
Experimental results on KITTI-360 and Waymo datasets show EVolSplat achieves superior quality and lower memory consumption compared to existing feed-forward methods for urban novel view synthesis.
EVolSplat's efficient real-time rendering capabilities and robust framework offer a practical solution for applications like autonomous driving and provide a foundation for research into dynamic urban environments and sensor integration.

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

"EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis" proposes a novel approach for generating high-quality novel view synthesis (NVS) of urban scenes. This method is particularly applicable to autonomous driving applications, addressing specific inadequacies present in existing NeRF and 3DGS approaches. While NeRF-based solutions have shown potential for photorealism, they are often hindered by slow per-scene optimization, making them impractical for real-time applications. EVolSplat introduces a feed-forward model leveraging 3D Gaussian Splatting (3DGS) in a unified framework, targeting efficient reconstruction and real-time rendering without significant memory overhead.

Overview of EVolSplat

The core contribution of this paper lies in its shift from pixel-aligned 3DGS methods to a global volume-based approach for predicting 3D Gaussians. Previous methods often encountered multi-view inconsistencies and duplicate content due to their reliance on local frustums for per-pixel Gaussian predicting, leading to ghosting artifacts especially notable in driving datasets. These artifacts are exacerbated by small parallax angles and texture-less regions. In contrast, EVolSplat predicts 3D Gaussians across multiple frames using a 3D convolutional network initialized with a noisy depth prediction.

This global approach allows for the refinement of geometric properties in 3D space and color prediction via 2D textures. It further addresses distant views and sky rendering through a flexible hemisphere background model, contributing to fast feed-forward reconstruction alongside quality rendering.

Experimental evaluations on KITTI-360 and Waymo datasets showcase EVolSplat's ability to achieve superior quality over existing methods, while maintaining lower memory consumption, crucial for practical deployment in autonomous vehicles. The method achieves state-of-the-art quality compared to other feed-forward 3DGS and NeRF-based solutions in novel street scenes.

Detailed Mechanism

Global Point Cloud Initialization: EVolSplat begins with monocular depth estimation from sparse urban imagery. These depth estimates are accumulated to form a global point cloud, which then informs the creation and refinement of 3D Gaussian primitives.
Volume-Based Gaussian Generation: Through the use of a sparse 3DCNN with skip connections, the method constructs a sparse neural feature volume that predicts Gaussian geometry attributes, refining initial primitive positions within 3D space effectively.
Occlusion-aware IBR-based Color Prediction: The paper addresses the challenge of high-frequency detail rendering through the introduction of image-based rendering techniques that predict Gaussian colors from local window projections, considering visibility maps based on depth checks.
Generalizable Background Model: Incorporating a hemisphere Gaussian model, the research tackles sky and distant view rendering with fixed geometric assumptions, tuning only color parameters for realistic background synthesis.

By employing these strategies, EVolSplat demonstrates its ability to perform efficient real-time rendering, offering an effective solution for NVS of urban environments relevant to autonomous driving.

Implications and Future Directions

The development of EVolSplat represents a significant practical contribution towards scalable NVS methods. Its innovative use of geometric priors, single-pass convolutional refinements, and modular background modeling can potentially serve as a robust foundation for future research, particularly in expanding its adaptability to dynamic urban environments and exploring finer integration of semantic data for enhanced scene comprehension.

Moving forward, research could delve into automatically adjusting model parameters for varying urban landscapes and optimizing rendering efficiency in dynamic scenes. Enhancing EVolSplat's modularity with regards to incorporating real-time sensor inputs or LiDAR data could further offer promising avenues to outperform existing methods in both reconstruction fidelity and computational efficiency in autonomous system applications.

The intersection of EVolSplat’s methodology with practical deployment scenarios highlights a step towards bridging the gap between photorealistic synthesis and real-time applicability in complex urban settings, marking a notable contribution to the field of AI-driven NVS.

Tweets

https://twitter.com/zhenjun_zhao/status/1905143788303241703

https://twitter.com/CSVisionPapers/status/1905409861753450713