Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Realistic Scene Generation with LiDAR Diffusion Models (2404.00815v2)

Published 31 Mar 2024 in cs.CV, cs.AI, and cs.RO

Abstract: Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle. This is primarily because DMs operating in the point space struggle to preserve the curve-like patterns and 3D geometry of LiDAR scenes, which consumes much of their representation power. In this paper, we propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes by incorporating geometric priors into the learning pipeline. Our method targets three major desiderata: pattern realism, geometry realism, and object realism. Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context. With these three core designs, our method achieves competitive performance on unconditional LiDAR generation in 64-beam scenario and state of the art on conditional LiDAR generation, while maintaining high efficiency compared to point-based DMs (up to 107$\times$ faster). Furthermore, by compressing LiDAR scenes into a latent space, we enable the controllability of DMs with various conditions such as semantic maps, camera views, and text prompts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Midjourney. https://www.midjourney.com.
  2. Stable diffusion. https://github.com/CompVis/stable-diffusion.
  3. Learning representations and generative models for 3d point clouds. In International conference on machine learning, pages 40–49. PMLR, 2018.
  4. Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles. In 2022 International Conference on Robotics and Automation (ICRA), pages 2419–2426. IEEE, 2022.
  5. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
  6. Super-resolution with deep convolutional sufficient statistics. arXiv preprint arXiv:1511.05666, 2015.
  7. Deep generative modeling of lidar data. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5034–5040. IEEE, 2019.
  8. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE international conference on computer vision, pages 1511–1520, 2017.
  9. SDFusion: Multimodal 3d shape completion, reconstruction, and generation. arXiv, 2022.
  10. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3075–3084, 2019.
  11. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  12. Generating images with perceptual similarity metrics based on deep networks. Advances in neural information processing systems, 29, 2016.
  13. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  14. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion, 2023.
  15. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  16. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
  17. Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking. IEEE Robotics and Automation Letters, 7(2):3795–3802, 2022.
  18. Maurice Fréchet. Sur la distance de deux lois de probabilité. In Annales de l’ISUP, pages 183–198, 1957.
  19. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
  20. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  21. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  22. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  23. Neural wavelet-domain diffusion for 3d shape generation, inversion, and manipulation. arXiv preprint arXiv:2302.00190, 2023.
  24. Pattern-aware data augmentation for lidar 3d object detection. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2703–2710. IEEE, 2021.
  25. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  26. Ground then navigate: Language-guided navigation in dynamic scenes. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4113–4120. IEEE, 2023.
  27. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  28. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  29. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  30. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  31. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  32. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566), pages 2149–2154. IEEE, 2004.
  33. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
  34. Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916, 2016.
  35. Diffusion-sdf: Text-to-shape via voxelized diffusion. arXiv preprint arXiv:2212.03293, 2022.
  36. 3dqd: Generalized deep 3d shape prior via part-discretized diffusion process. arXiv preprint arXiv:2303.10406, 2023.
  37. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3292–3310, 2022.
  38. Meshdiffusion: Score-based generative 3d mesh modeling. In International Conference on Learning Representations, 2023.
  39. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2837–2845, 2021.
  40. Controllable mesh generation through sparse latent point diffusion models. arXiv preprint arXiv:2303.07938, 2023.
  41. Lidarsim: Realistic lidar simulation by leveraging the real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11167–11176, 2020.
  42. Pc2: Projection-conditioned point cloud diffusion for single-image 3d reconstruction. In Arxiv, 2023.
  43. Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12677–12686, 2019.
  44. Rangenet++: Fast and accurate lidar semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE, 2019.
  45. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  46. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  47. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  48. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  49. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  50. Joseph Redmon. Darknet: Open source neural networks in c, 2013.
  51. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  52. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  53. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  54. 3d neural field generation using triplane diffusion. arXiv preprint arXiv:2211.16677, 2022.
  55. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  56. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  57. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  58. Curvecloudnet: Processing point clouds with 1d structure. arXiv preprint arXiv:2303.12050, 2023.
  59. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  60. Searching efficient 3d architectures with sparse point-voxel convolution. In European conference on computer vision, pages 685–702. Springer, 2020.
  61. TorchSparse: Efficient Point Cloud Inference Engine. In Conference on Machine Learning and Systems (MLSys), 2022.
  62. Gecco: Geometrically-conditioned point diffusion models. arXiv preprint arXiv:2303.05916, 2023.
  63. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  64. Learning interactive driving policies via data-driven simulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 7745–7752. IEEE, 2022.
  65. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In 2018 IEEE international conference on robotics and automation (ICRA), pages 1887–1893. IEEE, 2018.
  66. Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021.
  67. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
  68. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 206–215, 2018.
  69. Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  70. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  71. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  72. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5826–5835, 2021.
  73. Learning to generate realistic lidar point clouds. In European Conference on Computer Vision, pages 17–35. Springer, 2022.
Citations (10)

Summary

We haven't generated a summary for this paper yet.