Temporal-controlled Frame Swap for Generating High-Fidelity Stereo Driving Data for Autonomy Analysis (2306.01704v3)
Abstract: This paper presents a novel approach, TeFS (Temporal-controlled Frame Swap), to generate synthetic stereo driving data for visual simultaneous localization and mapping (vSLAM) tasks. TeFS is designed to overcome the lack of native stereo vision support in commercial driving simulators, and we demonstrate its effectiveness using Grand Theft Auto V (GTA V), a high-budget open-world video game engine. We introduce GTAV-TeFS, the first large-scale GTA V stereo-driving dataset, containing over 88,000 high-resolution stereo RGB image pairs, along with temporal information, GPS coordinates, camera poses, and full-resolution dense depth maps. GTAV-TeFS offers several advantages over other synthetic stereo datasets and enables the evaluation and enhancement of state-of-the-art stereo vSLAM models under GTA V's environment. We validate the quality of the stereo data collected using TeFS by conducting a comparative analysis with the conventional dual-viewport data using an open-source simulator. We also benchmark various vSLAM models using the challenging-case comparison groups included in GTAV-TeFS, revealing the distinct advantages and limitations inherent to each model. The goal of our work is to bring more high-fidelity stereo data from commercial-grade game simulators into the research domain and push the boundary of vSLAM models.
- Alexander Blade. Scripthookv. http://www.dev-c.com/gtav/scripthookv, 2015.
- ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Transactions on Robotics, 37(6):1874–1890, 2021.
- Adrian Courrèges. Gta v graphics study. https://www.adriancourreges.com/blog/2015/11/02/gta-v-graphics-study/, 2015.
- Crosire. Scripthookvdotnet. https://github.com/crosire/scripthookvdotnet, 2015.
- Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
- Vision meets robotics: The kitti dataset. In International Journal of Robotics Research (IJRR), volume 32, pages 1231–1237. Sage Publications Sage UK: London, England, 2013.
- Deepmvs: Learning multi-view stereopsis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Precise synthetic image and lidar (presil) dataset for autonomous vehicle perception. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 2444–2451. IEEE, 2019. 10.1109/IVS.2019.8813809.
- Dtam: Dense tracking and mapping in real-time. In IEEE International Conference on Computer Vision (ICCV), 2011. URL https://ieeexplore.ieee.org/document/6126513.
- Rockstar North. Grand theft auto v, 2013. URL https://www.rockstargames.com/V/.
- A general optimization-based framework for global pose estimation with multiple sensors, 2019.
- Matěj Račinský. 3d map estimation from a single rgb image. Master’s thesis, Czech Technical University in Prague, May 2018.
- Playing for data: Ground truth from computer games. In European Conference on Computer Vision (ECCV), pages 102–118. Springer, 2016.
- Playing for benchmarks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2232–2241, 2017. 10.1109/ICCV.2017.243. URL https://doi.org/10.1109/ICCV.2017.243.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
- ScreenRant. Gta 5: How much it cost to make rockstar’s open world game, 2019. URL https://screenrant.com/grand-theft-auto-5-how-much-cost-make/.
- Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics, pages 621–635. Springer, 2018.
- The ursa dataset: A large scale dataset for unmanned surface vehicle perception in coastal environments. In 2019 IEEE International Conference on Robotics and Automation (ICRA), pages 4603–4609. IEEE, 2019. 10.1109/ICRA.2019.8793719.
- DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. Advances in neural information processing systems, 2021.
- UM & Ford Center for Autonomous Vehicles (FCAV). GTAVisionExport. https://github.com/umautobots/GTAVisionExport, 2018. GitHub repository.
- Flow-motion and depth network for monocular stereo and beyond. IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020. 10.1109/LRA.2020.2975750.
- Tartanair: A dataset to push the limits of visual slam. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020.