A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7 (2504.09021v1)

Published 12 Apr 2025 in cs.LG

Abstract: Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting real-world applicability. To address this limitation, we introduce a vision-based autonomous racing agent that relies solely on ego-centric camera views and onboard sensor data, eliminating the need for precise localization during inference. This agent employs an asymmetric actor-critic framework: the actor uses a recurrent neural network with the sensor data local to the car to retain track layouts and opponent positions, while the critic accesses the global features during training. Evaluated in GT7, our agent consistently outperforms GT7's built-drivers. To our knowledge, this work presents the first vision-based autonomous racing agent to demonstrate champion-level performance in competitive racing scenarios.

Summary

Champion-Level Vision-Based Reinforcement Learning for Racing

This paper presents a vision-based reinforcement learning (RL) agent that demonstrates champion-level performance in competitive racing environments, specifically focusing on Gran Turismo 7 (GT7). The research addresses the challenge of applying deep reinforcement learning to autonomous racing where previous successful implementations used methods reliant on global state information from external instrumentation. These methods, while effective in simulation, face significant barriers when transferred to real-world scenarios due to the inherent difficulties in acquiring precise global features in real-time.

To circumvent these challenges, the authors developed a novel RL agent that operates exclusively with ego-centric camera views and onboard sensor data, eliminating the need for external localization instruments during inference. The agent employs an asymmetric recurrent actor-critic architecture, which provides the actor network with local sensor inputs and vision data, while the critic network has access to global state features during training only. This framework allows the actor to make driving decisions based solely on information available to a real-world vehicle.

Key Methodological Innovations

Asymmetric Actor-Critic Architecture: This design ensures that the actor network makes use of real-time data from sensors and a front-facing camera while the critic benefits from global state information during the training phase, refining the agent's decision policies based on comprehensive environmental understanding.
Recurrent Memory Module: Integrated within the actor network, this module utilizes a Gated Recurrent Unit to maintain temporal continuity, which aids in estimating track layouts and understanding opponents' positions and velocities, thus addressing partial observability challenges inherent in racing scenarios.
Regularization Techniques: To enhance generalization and stability, network weights are periodically reinitialized and vision data is augmented through random image shifts, thus minimizing overfitting to specific visual features and facilitating robust decision-making across diverse racing conditions.

Evaluation and Results

The approach was evaluated in three distinct GT7 scenarios—Tokyo Expressway, Circuit de Spa-Francorchamps, and 24 Heures du Mans race track—each presenting unique challenges in terms of track configuration and vehicle dynamics. Across these scenarios, the developed agent consistently outperformed GT7’s built-in AI (BIAI) and human champions, achieving superior winning margins while maintaining sportsmanship by avoiding excessive collisions during overtaking maneuvers.

The agent's superior performance is attributed to its enhanced ability to interpret complex visual cues from the environment, enabling precise control and strategic interactions with opponents. Notably, the agent’s racing lines and overtaking strategies are often more precise than those demonstrated by human champions, especially on tracks with complex layouts or high-speed segments.

Implications and Future Directions

The paper's findings underscore the potential for vision-based reinforcement learning to deliver high-performance autonomous racing solutions capable of champion-level execution without reliance on extensive external sensory instrumentation. This research not only enhances the feasibility of deploying such agents in real-world racing scenarios but also contributes to the broader field of vision-based autonomous navigation.

Looking ahead, further research could explore the extension of this framework to more varied environmental settings, including different weather conditions, vehicle types, and track configurations. Moreover, integrating this approach into real-world autonomous driving systems, capable of complex interactions within urban environments, represents a promising avenue for future studies. The development of standardized benchmarks and simulation platforms for autonomous racing will further drive progress in this burgeoning research domain.

Related Papers

YouTube

Show All Videos