Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Generative AI for Sim2Real in Driving Data Synthesis (2404.09111v1)

Published 14 Apr 2024 in cs.CV

Abstract: Datasets are essential for training and testing vehicle perception algorithms. However, the collection and annotation of real-world images is time-consuming and expensive. Driving simulators offer a solution by automatically generating various driving scenarios with corresponding annotations, but the simulation-to-reality (Sim2Real) domain gap remains a challenge. While most of the Generative AI follows the de facto Generative Adversarial Nets (GANs)-based methods, the recent emerging diffusion probabilistic models have not been fully explored in mitigating Sim2Real challenges for driving data synthesis. To explore the performance, this paper applied three different generative AI methods to leverage semantic label maps from a driving simulator as a bridge for the creation of realistic datasets. A comparative analysis of these methods is presented from the perspective of image quality and perception. New synthetic datasets, which include driving images and auto-generated high-quality annotations, are produced with low costs and high scene variability. The experimental results show that although GAN-based methods are adept at generating high-quality images when provided with manually annotated labels, ControlNet produces synthetic datasets with fewer artefacts and more structural fidelity when using simulator-generated labels. This suggests that the diffusion-based approach may provide improved stability and an alternative method for addressing Sim2Real challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3354–3361.
  2. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645.
  3. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  4. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
  5. X. Li, K. Wang, Y. Tian, L. Yan, F. Deng, and F.-Y. Wang, “The paralleleye dataset: A large collection of virtual images for traffic vision research,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 6, pp. 2072–2084, 2018.
  6. Y. Wang, H. Zhao, K. Debattista, and V. Donzella, “The effect of camera data degradation factors on panoptic segmentation for automated driving,” 26th IEEE International Conference on Intelligent Transportation Systems, 2023.
  7. A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2real predictivity: Does evaluation in simulation predict real-world performance?” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6670–6677, 2020.
  8. T. Sun, M. Segu, J. Postels, Y. Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu, “Shift: a synthetic driving dataset for continuous multi-task domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21 371–21 382.
  9. X. Li, K. Wang, X. Gu, F. Deng, and F.-Y. Wang, “Paralleleye pipeline: An effective method to synthesize images for improving the visual intelligence of intelligent vehicles,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023.
  10. X. Hu, S. Li, T. Huang, B. Tang, R. Huai, and L. Chen, “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,” IEEE Transactions on Intelligent Vehicles, 2023.
  11. Y. Wang, P. H. Chan, and V. Donzella, “Semantic-aware video compression for automotive cameras,” IEEE Transactions on Intelligent Vehicles, 2023.
  12. Y. Wang, H. Zhao, D. Gummadi, M. Dianati, K. Debattista, and V. Donzella, “Benchmarking the robustness of panoptic segmentation for automated driving,” arXiv preprint arXiv:2402.15469, 2024.
  13. P. Zhou, J. Zhu, Y. Wang, Y. Lu, Z. Wei, H. Shi, Y. Ding, Y. Gao, Q. Huang, Y. Shi et al., “Vetaverse: Technologies, applications, and visions toward the intersection of metaverse, vehicles, and transportation systems,” arXiv preprint arXiv:2210.15109, 2022.
  14. C.-T. Lin, S.-W. Huang, Y.-Y. Wu, and S.-H. Lai, “Gan-based day-to-night image style transfer for nighttime vehicle detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 951–963, 2020.
  15. C. Jung, G. Kwon, and J. C. Ye, “Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 260–18 269.
  16. K. Baek, Y. Choi, Y. Uh, J. Yoo, and H. Shim, “Rethinking the truly unsupervised image-to-image translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 154–14 163.
  17. T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16.   Springer, 2020, pp. 319–345.
  18. K. Kim, S. Park, E. Jeon, T. Kim, and D. Kim, “A style-aware discriminator for controllable image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 239–18 248.
  19. D. Torbunov, Y. Huang, H. Yu, J. Huang, S. Yoo, M. Lin, B. Viren, and Y. Ren, “Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 702–712.
  20. E. Schönfeld, J. Borges, V. Sushko, B. Schiele, and A. Khoreva, “Discovering class-specific gan controls for semantic image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 688–697.
  21. W. Wang, J. Bao, W. Zhou, D. Chen, D. Chen, L. Yuan, and H. Li, “Semantic image synthesis via diffusion models,” arXiv preprint arXiv:2207.00050, 2022.
  22. P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
  23. L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.
  24. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021.
  25. G. Kim, T. Kwon, and J. C. Ye, “Diffusionclip: Text-guided diffusion models for robust image manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2426–2435.
  26. Y. Li, M. Keuper, D. Zhang, and A. Khoreva, “Adversarial supervision makes layout-to-image diffusion models thrive,” in ICLR, 2024.
  27. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  28. S. Yang, L. Jiang, Z. Liu, and C. C. Loy, “Unsupervised image-to-image translation with generative prior,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 332–18 341.
  29. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
  30. V. Sushko, E. Schönfeld, D. Zhang, J. Gall, B. Schiele, and A. Khoreva, “Oasis: only adversarial supervision for semantic image synthesis,” International Journal of Computer Vision, vol. 130, no. 12, pp. 2903–2923, 2022.
  31. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning.   PMLR, 2015, pp. 2256–2265.
  32. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
  33. O. Avrahami, D. Lischinski, and O. Fried, “Blended diffusion for text-driven editing of natural images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 208–18 218.
  34. O. Gafni, A. Polyak, O. Ashual, S. Sheynin, D. Parikh, and Y. Taigman, “Make-a-scene: Scene-based text-to-image generation with human priors,” in European Conference on Computer Vision.   Springer, 2022, pp. 89–106.
  35. O. Avrahami, O. Fried, and D. Lischinski, “Blended latent diffusion,” ACM Transactions on Graphics (TOG), vol. 42, no. 4, pp. 1–11, 2023.
  36. X. Wang, Z. Zhu, G. Huang, X. Chen, and J. Lu, “Drivedreamer: Towards real-world-driven world models for autonomous driving,” arXiv preprint arXiv:2309.09777, 2023.
  37. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  38. S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14.   Springer, 2016, pp. 102–118.
  39. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
  40. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  41. T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
  42. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8798–8807.
  43. Y. Li, H. Liu, Q. Wu, F. Mu, J. Yang, J. Gao, C. Li, and Y. J. Lee, “Gligen: Open-set grounded text-to-image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 511–22 521.
  44. Jan 2024. [Online]. Available: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
  45. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  46. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  47. M. P. Sampat, Z. Wang, S. Gupta, A. C. Bovik, and M. K. Markey, “Complex wavelet structural similarity: A new image similarity index,” IEEE transactions on image processing, vol. 18, no. 11, pp. 2385–2401, 2009.
  48. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “Fsim: A feature similarity index for image quality assessment,” IEEE transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
  49. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on image processing, vol. 21, no. 12, pp. 4695–4708, 2012.
  50. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal processing letters, vol. 20, no. 3, pp. 209–212, 2012.
  51. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  52. J. Jain, J. Li, M. T. Chiu, A. Hassani, N. Orlov, and H. Shi, “Oneformer: One transformer to rule universal image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2989–2998.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Haonan Zhao (12 papers)
  2. Yiting Wang (25 papers)
  3. Thomas Bashford-Rogers (13 papers)
  4. Valentina Donzella (5 papers)
  5. Kurt Debattista (21 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com