CityDreamer: Compositional Generative Model of Unbounded 3D Cities (2309.00610v3)
Abstract: 3D city generation is a desirable yet challenging task, since humans are more sensitive to structural distortions in urban environments. Additionally, generating 3D cities is more complex than 3D natural scenes since buildings, as objects of the same class, exhibit a wider range of appearances compared to the relatively consistent appearance of objects like trees in natural scenes. To address these challenges, we propose \textbf{CityDreamer}, a compositional generative model designed specifically for unbounded 3D cities. Our key insight is that 3D city generation should be a composition of different types of neural fields: 1) various building instances, and 2) background stuff, such as roads and green lands. Specifically, we adopt the bird's eye view scene representation and employ a volumetric render for both instance-oriented and stuff-oriented neural fields. The generative hash grid and periodic positional embedding are tailored as scene parameterization to suit the distinct characteristics of building instances and background stuff. Furthermore, we contribute a suite of CityGen Datasets, including OSM and GoogleEarth, which comprises a vast amount of real-world city imagery to enhance the realism of the generated 3D cities both in their layouts and appearances. CityDreamer achieves state-of-the-art performance not only in generating realistic 3D cities but also in localized editing within the generated cities.
- https://openstreetmap.org.
- https://earth.google.com/studio.
- GAUDI: A neural architect for immersive 3d scene generation. In NeurIPS, 2022.
- Demystifying MMD GANs. In ICLR, 2018.
- Unleashing transformers: Parallel token prediction with discrete absorbing diffusion for fast high-resolution image generation from vector-quantized codes. In ECCV, 2022.
- Towards better user studies in computer graphics and vision. Foundations and Trends in Computer Graphics and Vision, 15(3):201–252, 2023.
- Persistent Nature: A generative model of unbounded 3D worlds. In CVPR, 2023.
- Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
- MaskGIT: Masked generative image transformer. In CVPR, 2022.
- Interactive procedural street modeling. ACM TOG, 27(3):103, 2008.
- SceneDreamer: Unbounded 3D scene generation from 2D image collections. TPAMI, 45(12):15562–15576, 2023.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In CVPR, 2017.
- Unconstrained scene generation with locally conditioned radiance fields. In ICCV, 2021.
- Taming transformers for high-resolution image synthesis. In CVPR, 2021.
- 3D-FRONT: 3D furnished rooms with layouts and semantics. In ICCV, 2021.
- 3d shape induction from 2D views of multiple objects. In 3DV, 2017.
- GET3D: A generative model of high quality 3D textured shapes learned from images. In NeurIPS, 2022.
- Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012.
- Generative adversarial nets. In NIPS, 2014.
- StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In ICLR, 2022.
- GANCraft: Unsupervised 3D neural rendering of minecraft worlds. In ICCV, 2021.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017.
- EVA3D: compositional 3D human generation from 2D image collections. In ICLR, 2023.
- Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE TPAMI, 36(7):1325–1339, 2014.
- Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.
- A style-based generator architecture for generative adversarial networks. IEEE TPAMI, 43(12):4217–4228, 2021.
- Analyzing and improving the image quality of stylegan. In CVPR, 2020.
- DreamHuman: Animatable 3D avatars from text. arXiv, 2306.09329, 2023.
- OmniCity: Omnipotent city understanding with multi-level and multi-view images. In CVPR, 2023.
- InfiniteNature-Zero: Learning perpetual view generation of natural scenes from single images. In ECCV, 2022.
- Geometric GAN. arXiv, 1705.02894, 2017.
- InfinityGan: Towards infinite-pixel image synthesis. In ICLR, 2022.
- InfiniCity: Infinite-scale city synthesis. In ICCV, 2023.
- Capturing, reconstructing, and simulating: The urbanscene3d dataset. In ECCV, 2022.
- Infinite Nature: Perpetual view generation of natural scenes from a single image. In ICCV, 2021.
- World-consistent video-to-video synthesis. In ECCV, 2020.
- UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI, 2018.
- NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- HoloGAN: Unsupervised learning of 3D representations from natural images. In CVPR, 2019.
- StyleSDF: High-resolution 3D-consistent image and geometry generation. In CVPR, 2022.
- Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
- ATISS: autoregressive transformers for indoor scene synthesis. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, NeurIPS, 2021.
- Ken Perlin. An image synthesizer. In SIGGRAPH, 1985.
- DreamBooth3D: Subject-driven text-to-3D generation. arXiv, 2303.13508, 2023.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 44(3):1623–1637, 2022.
- Generating diverse high-fidelity images with VQ-VAE-2. In NeurIPS, 2019.
- Structure-from-motion revisited. In CVPR, 2016.
- SGAM: building a virtual 3D world through simultaneous generation and mapping. In NeurIPS, 2022.
- 3D-aware indoor scene synthesis with depth priors. In ECCV, 2022.
- The Replica Dataset: A digital replica of indoor spaces. arXiv, 1906.05797, 2019.
- Neural discrete representation learning. In NIPS, 2017.
- Sceneformer: Indoor scene generation with transformers. In 3DV, 2021.
- Spacenet MVOI: A multi-view overhead imagery dataset. In ICCV, 2019.
- Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS, 2016.
- OmniObject3D: Large-vocabulary 3D object dataset for realistic perception, reconstruction and generation. In CVPR, 2023.
- Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV, 2019.
- Pix2Vox++: Multi-scale context-aware 3D object reconstruction from single and multiple images. IJCV, 128(12):2919–2935, 2020.
- GIRAFFE HD: A high-resolution 3D-aware generative model. In CVPR, 2022.
- StyleAvatar3D: Leveraging image-text diffusion models for high-fidelity 3D avatar generation. arXiv, 2305.19012, 2023.
- HoliCity: A city-scale data platform for learning holistic 3D structures. arXiv, 2008.03286, 2020.