- The paper introduces a novel framework that uses BEV-Point initialization to maintain constant VRAM usage while enabling scalable 3D city generation.
- It employs a spatial-aware Gaussian attribute decoder to enhance scene detail and ensure structural consistency in expansive urban models.
- Experiments on GoogleEarth and KITTI-360 show significant improvements in visual quality and runtime speed compared to state-of-the-art methods.
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation
Introduction
The paper "GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation" introduces a novel framework designed to address the inefficiencies and scalability issues in the generation of expansive 3D cityscapes. Traditional NeRF-based methods capture intricate details but are computationally demanding, limiting their applicability to large-scale scenes. The proposed method leverages the efficiency of 3D Gaussian splatting (3D-GS) to overcome these limitations, rendering extensive urban environments with significantly reduced computational overhead.
Key Contributions
The contributions of this paper are multifaceted:
- Compact 3D Scene Representation: The introduction of BEV-Point as a highly compact intermediate representation mitigates VRAM usage growth, ensuring constant memory requirements regardless of the scene's expanse.
- Spatial-aware Gaussian Attribute Decoder: This novel decoder integrates structural and contextual characteristics of BEV points, enhancing the representation quality and consistency of the generated scenes.
Methodology
The GaussianCity framework hinges on two pivotal components:
- BEV-Point Initialization: This compact scene representation keeps VRAM usage constant by considering only visible BEV points during rendering and optimization. This is achieved through ray intersection to filter out visible points from the BEV maps, which include the height field, semantic map, and binary density map.
- BEV-Point Decoder: This decoder employs a point serializer and a point transformer to generate 3D Gaussian attributes. The point serializer restructures unstructured BEV points into a sequence, while the point transformer processes these serialized features to maintain spatial correlations.
Experimental Results
The efficacy of GaussianCity is validated through extensive experiments on the GoogleEarth and KITTI-360 datasets, showcasing superior performance in terms of visual quality and computational efficiency.
- Quantitative Metrics: On the GoogleEarth dataset, GaussianCity achieves remarkably lower FID and KID scores (86.94 and 0.090, respectively) compared to the state-of-the-art CityDreamer, which scored 97.38 and 0.096. Moreover, GaussianCity significantly outperforms CityDreamer in runtime efficiency, achieving a speedup of 60 times (10.72 FPS vs. 0.18 FPS). Similar trends are observed on the KITTI-360 dataset, with GaussianCity achieving FID and KID scores of 29.5 and 0.017, respectively.
- Qualitative Comparisons: Visual inspections reveal that GaussianCity excels in preserving structural details and handling complex textures, outperforming methods like PersistentNature, SceneDreamer, and InfiniCity. The reduction of artifacts and more consistent multi-view generation underscore the method's robustness.
Implications and Future Developments
The implications of GaussianCity are significant across multiple domains, including gaming, virtual reality, and urban planning. By reducing memory overhead and enhancing rendering speed, this method makes real-time, large-scale 3D city generation feasible. The introduction of compact representations and efficient decoders lays the groundwork for future research focusing on further optimization and broader applicability.
Future research may explore generating additional Gaussian attributes, such as xyz offsets, opacity, and scale, to fully harness the representational capacity of 3D Gaussian splatting. Additionally, improving the BEV-Point Initialization process to handle more complex structures, beyond the Manhattan assumption, could further enhance the generated scene's realism.
Conclusion
GaussianCity represents a significant advancement in the field of 3D city generation. By leveraging a compact representation and an efficient decoder, it addresses key limitations of traditional methods, enabling the generation of unbounded 3D cityscapes with high realism and efficiency. This work establishes a solid foundation for continued innovation in scalable and efficient 3D scene generation techniques.