G2D: from GTA to Data

Published 16 Jun 2018 in cs.CV | (1806.07381v1)

Abstract: This document describes G2D, a software that enables capturing videos from Grand Theft Auto V (GTA V), a popular role playing game set in an expansive virtual city. The target users of our software are computer vision researchers who wish to collect hyper-realistic computer-generated imagery of a city from the street level, under controlled 6DOF camera poses and varying environmental conditions (weather, season, time of day, traffic density, etc.). G2D accesses/calls the native functions of the game; hence users can directly interact with G2D while playing the game. Specifically, G2D enables users to manipulate conditions of the virtual environment on the fly, while the gameplay camera is set to automatically retrace a predetermined 6DOF camera pose trajectory within the game coordinate system. Concurrently, automatic screen capture is executed while the virtual environment is being explored. G2D and its source code are publicly available at https://goo.gl/SS7fS6 In addition, we demonstrate an application of G2D to generate a large-scale dataset with groundtruth camera poses for testing structure-from-motion (SfM) algorithms. The dataset and generated 3D point clouds are also made available at https://goo.gl/DNzxHx

Abstract PDF Upgrade to Chat

Citations (20)

View on Semantic Scholar

Summary

The paper introduces G2D, a software tool leveraging Grand Theft Auto V to generate realistic computer vision datasets with 6DOF ground truth camera poses.
G2D integrates with GTA V using Scripthook V, enabling dynamic control over environmental factors like weather and time to create diverse data scenarios.
These datasets provide precise ground truth, facilitating the development and rigorous testing of algorithms for tasks such as SfM, SLAM, and camera pose estimation.

Overview of G2D: From GTA to Data

The paper under review introduces G2D, a specialized software designed for computer vision researchers seeking to gather comprehensive image datasets from the virtual environment of Grand Theft Auto V (GTA V). This tool is particularly valuable for obtaining hyper-realistic, computer-generated imagery under various controlled conditions, including 6DOF camera poses, which are critical for testing and developing algorithms in domains such as structure-from-motion (SfM), visual SLAM, and camera pose estimation.

Core Contributions

G2D offers several features that enhance its utility for computer vision research:

Integration with GTA V: G2D interfaces directly with the native functions of GTA V, allowing users to manipulate environmental variables dynamically, such as weather, time of day, and traffic density. This capability facilitates the creation of diverse datasets without the substantial resource investment typically required for real-world data collection.
Camera Trajectory Control: Users can define sparse trajectories through user-defined vertices and orders, automatically generating dense trajectories. This ensures that imagery can be captured consistently across numerous environmental scenarios, thereby producing datasets with 6DOF groundtruth camera poses.
Automated Data Collection: The software captures images at a standard video rate of 60 frames per second, maintaining the normal operation of the game environment. Collected data include the positional and rotational information of the camera, offering precise groundtruth for subsequent analysis.
Environmental Variability: By leveraging the native functionalities of GTA V, G2D enables manipulation of environmental parameters such as weather conditions (clear, rain, snow), time settings (day or night), and adjusting the density of vehicular and pedestrian traffic.

The paper contextualizes G2D among other virtual-world-based dataset generators, such as CARLA, Europilot, and SYNTHIA. These platforms similarly use virtual environments to generate datasets, mainly for autonomous vehicle simulation and computer vision tasks like semantic segmentation. However, G2D distinguishes itself by utilizing GTA V's highly realistic urban environment, offering a more immediate and visually representative simulation for urban navigation contexts.

Methodology

The foundation of G2D's functionality lies in the utilization of Scripthook V, a library granting access to GTA V's internal functions. This design choice allows direct reading and manipulation of the game environment and character dynamics, thereby facilitating automated, precise data collection without disrupting the gameplay mechanics.

Practical Applications and Implications

A primary application demonstrated in the paper is testing SfM algorithms. By providing datasets with accurately known groundtruths, G2D offers an experimental platform where algorithms can be evaluated for robustness and accuracy. G2D's datasets include camera pose information from the GTA V coordinate system, enabling rigorous benchmarking against known pose data after coordinate registration.

Future Prospects

The open-source nature of G2D presents numerous avenues for extension and adaptation. Researchers could enhance the software to simulate even more complex scenarios or integrate additional environmental modifications. Additionally, while current applications focus on SfM, G2D's capability extends to numerous other computer vision tasks, such as object detection and tracking, making it a versatile tool for research applications.

Conclusion

G2D showcases a sophisticated approach to synthetic dataset generation that leverages the graphical fidelity and interactive features of a popular commercial game. By mitigating the logistical challenges associated with large-scale environmental data collection, G2D provides a critical resource for computer vision research. Its impact lies in facilitating the development and testing of algorithms in controlled yet remarkably realistic virtual scenarios, potentially advancing both theoretical and practical aspects of computer vision.

Markdown