Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Urban Scene Diffusion through Semantic Occupancy Map (2403.11697v2)

Published 18 Mar 2024 in cs.CV

Abstract: Generating unbounded 3D scenes is crucial for large-scale scene understanding and simulation. Urban scenes, unlike natural landscapes, consist of various complex man-made objects and structures such as roads, traffic signs, vehicles, and buildings. To create a realistic and detailed urban scene, it is crucial to accurately represent the geometry and semantics of the underlying objects, going beyond their visual appearance. In this work, we propose UrbanDiffusion, a 3D diffusion model that is conditioned on a Bird's-Eye View (BEV) map and generates an urban scene with geometry and semantics in the form of semantic occupancy map. Our model introduces a novel paradigm that learns the data distribution of scene-level structures within a latent space and further enables the expansion of the synthesized scene into an arbitrary scale. After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes given the BEV maps from the held-out set and also generalize to the synthesized maps from a driving simulator. We further demonstrate its application to scene image synthesis with a pretrained image generator as a prior.

Citations (3)

Summary

We haven't generated a summary for this paper yet.