Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks (2403.15370v1)

Published 22 Mar 2024 in cs.CV, cs.LG, and cs.RO

Abstract: Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the substantial efforts required from 3D artists to create realistic environments. To overcome these challenges, we present ARSim, a fully automated, comprehensive, modular framework designed to enhance real multi-view image data with 3D synthetic objects of interest. The proposed method integrates domain adaptation and randomization strategies to address covariate shift between real and simulated data by inferring essential domain attributes from real data and employing simulation-based randomization for other attributes. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. Illumination is achieved by estimating light distribution from multiple images capturing the surroundings of the vehicle. Camera parameters from real data are employed to render synthetic assets in each frame. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles. Experimental results on various AV perception tasks demonstrate the superior performance of networks trained on the augmented dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  2. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  3. Hazardnet: Road debris detection by augmentation of synthetic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 161–171, June 2023.
  4. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  5. Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation. arXiv preprint arXiv:1807.09384, 2018.
  6. Modeling visual context is key to augmenting object detection datasets. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  7. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  8. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
  9. Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090, 2017.
  10. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2918–2928, June 2021.
  11. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE conference on computer vision and pattern recognition, pages 2066–2073. IEEE, 2012.
  12. Gen-lanenet: A generalized and scalable approach for 3d lane detection. In ECCV, 2020.
  13. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  14. Deep sky modeling for single image outdoor lighting estimation. In CVPR, pages 6927–6935, 2019.
  15. Deep outdoor illumination estimation. In CVPR, pages 7312–7321, 2017.
  16. Selecting data augmentation for simulating interventions. In International Conference on Machine Learning, pages 4555–4562. PMLR, 2021.
  17. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  18. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  19. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR 2011, pages 1785–1792. IEEE, 2011.
  20. Deeplight: Learning illumination for unconstrained mobile mixed reality. In CVPR, pages 5918–5928, 2019.
  21. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In CVPR, pages 2475–2484, 2020.
  22. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, pages 1–18. Springer, 2022.
  23. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
  24. NVIDIA. Drive sim. https://www.nvidia.com/en-us/self-driving-cars/simulation/. Accessed: 2023-11-08.
  25. Nvautonet: Fast and accurate 360deg 3d visual perception for self driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7376–7385, 2024.
  26. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
  27. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  28. Categorical depth distribution network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8555–8564, June 2021.
  29. Playing for data: Ground truth from computer games. In ECCV.
  30. Lgsvl simulator: A high fidelity simulator for autonomous driving. In 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pages 1–6. IEEE, 2020.
  31. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016.
  32. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4570–4580, 2019.
  33. Lighthouse: Predicting lighting volumes for spatially-coherent illumination. In CVPR, pages 8080–8089, 2020.
  34. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020.
  35. Alexandru Telea. An image inpainting technique based on the fast marching method. Journal of graphics tools, 9(1):23–34, 2004.
  36. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
  37. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 913–922, October 2021.
  38. Neural light field estimation for street scenes with differentiable virtual object insertion. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
  39. Learning indoor inverse rendering with 3d spatially-varying lighting. In ICCV, 2021.
  40. Wikipedia contributors. Fisheye lens — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Fisheye_lens&oldid=1213587243, 2024. [Online; accessed 14-March-2024].
  41. Wikipedia contributors. Relative luminance — Wikipedia, the free encyclopedia, 2024. [Online; accessed 14-March-2024].
  42. Torcs, the open racing car simulator. Software available at http://torcs. sourceforge. net, 4(6):2, 2000.
  43. All-weather deep outdoor lighting estimation. In CVPR, pages 10158–10166, 2019.
  44. X-paste: Revisiting scalable copy-paste for instance segmentation using CLIP and StableDiffusion. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 42098–42109. PMLR, 23–29 Jul 2023.
  45. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aqeel Anwar (8 papers)
  2. Tae Eun Choe (5 papers)
  3. Zian Wang (27 papers)
  4. Sanja Fidler (184 papers)
  5. Minwoo Park (8 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com