Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion (2401.10786v2)

Published 19 Jan 2024 in cs.CV

Abstract: Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly focused on image or video generation, lacking exploration into the adaptability of scene generation for arbitrary views. Existing 3D generation works either operate at the object level or are difficult to utilize the geometry obtained from satellite imagery. To overcome these limitations, we propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques. Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner. The representation can be utilized to render arbitrary views which would excel in both single-frame quality and inter-frame consistency. Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Demystifying MMD gans. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  2. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  3. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16123–16133, 2022.
  4. Vision transformer adapter for dense predictions. In The Eleventh International Conference on Learning Representations, 2023.
  5. Masked-attention mask transformer for universal image segmentation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  6. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019.
  7. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  8. Unconstrained scene generation with locally conditioned radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14304–14313, 2021.
  9. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, pages 8780–8794. Curran Associates, Inc., 2021.
  10. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14300–14310, 2023.
  11. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  12. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds. In ICCV, 2021.
  13. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
  15. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851. Curran Associates, Inc., 2020.
  16. Neuralfield-ldm: Scene generation with hierarchical latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8496–8506, 2023.
  17. Omnicity: Omnipotent city understanding with multi-level and multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17397–17407, 2023.
  18. Sat2vid: Street-view panoramic video synthesis from a single satellite image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12436–12445, 2021.
  19. Compnvs: Novel view synthesis with scene completion. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I, pages 447–463. Springer, 2022.
  20. Infinicity: Infinite-scale city synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22808–22818, 2023.
  21. Neural sparse voxel fields. In Thirty-four Conference on Neural Information Processing Systems (NeurIPS), 2020.
  22. Geometry-aware satellite-to-ground image synthesis for urban areas. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  23. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  24. NeRF: Representing scenes as neural radiance fields for view synthesis. In The European Conference on Computer Vision (ECCV), 2020.
  25. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4328–4338, 2023.
  26. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (SIGGRAPH), 41(4):102:1–102:15, 2022.
  27. Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  28. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  29. Sat2density: Faithful density learning from satellite-ground image pairs. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3683–3692, 2023.
  30. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  31. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241. Springer, 2015.
  32. Geometry-guided street-view panorama synthesis from satellite imagery. In TPAMI, 2022.
  33. Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  34. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
  35. FVD: A new metric for video generation. In Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019, 2019.
  36. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5438–5448, 2022.
  37. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4578–4587, 2021.
  38. Texture generation on 3d meshes with point-uv diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4206–4216, 2023.
  39. Cem Yuksel. Sample Elimination for Generating Poisson Disk Sample Sets. Computer Graphics Forum, 2015.
  40. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023.
  41. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  42. HoliCity: A city-scale data platform for learning holistic 3D structures. arXiv, 2020. arXiv:2008.03286 [cs.CV].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zuoyue Li (9 papers)
  2. Zhenqiang Li (11 papers)
  3. Zhaopeng Cui (64 papers)
  4. Marc Pollefeys (230 papers)
  5. Martin R. Oswald (69 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com