F1TENTH Gym: Autonomous Racing Simulator
- F1TENTH Gym is a modular simulation environment designed for autonomous racing research, integrating both non-ROS and ROS-compatible workflows.
- It implements a discrete-time kinematic bicycle model with composite reward formulations and domain randomization to enhance policy robustness.
- The platform features rigorous evaluation protocols and supports policy-gradient RL methods to bridge simulation development with real-world deployment.
F1TENTH Gym is a modular, OpenAI Gym-compatible simulation environment designed for research in autonomous racing, specifically tailored to the F1TENTH autonomous vehicle platform. It enables the study of high-speed perception, planning, and control algorithms in a reproducible, scalable, and simulator-agnostic manner. The environment provides interfaces for both non-ROS and ROS-based workflows, supports rigorous simulation-to-reality (Sim2Real) strategies, exposes a kinematic bicycle model for agent training, and includes evaluation protocols aligned with the requirements of competitive and academic autonomous racing research (Charles et al., 18 Jun 2025, Shi, 2021).
1. Simulation Environment and API
F1TENTH Gym is distributed as “f1tenth_gym” (Python/NumPy backend, no ROS required) and “f1tenth_gym_ros” (ROS-wrapped, containerized). The API follows standard Gym conventions:
- Observation space: A continuous Box vector comprising LiDAR scan (N beams), ego pose , velocity , and optionally distance/bearing to next waypoint. Example: .
- Action space: Continuous vector , where is longitudinal acceleration (throttle/brake) and is steering command in radians.
- Episode logic: Reset provides initial observation; step takes action, returns . Episodes terminate on collision, off-track, reaching max steps, or lap completion.
- ROS integration: Topics
/scan,/cmd_vel,/odomfor sensor/command exchange; episode metrics reported via/gym/info.
These design choices facilitate rapid prototyping and seamless transfer from simulation to the physical F1TENTH platform (Charles et al., 18 Jun 2025, Shi, 2021).
2. Vehicle Dynamics Model
The simulator is grounded in a discrete-time kinematic bicycle model: where denote world-frame position, heading, 0 wheelbase, 1 longitudinal speed, 2 steering angle, and 3 longitudinal acceleration. An optional first-order lag on steering actuation is supported: 4 This abstraction aligns with widely-adopted mathematical and software conventions for both model-based and RL-based autonomous vehicle research (Charles et al., 18 Jun 2025, Shi, 2021).
3. Reward Formulations
Reward design follows composite schemes standard in autonomous racing: 5 with 6 as forward path progress, 7 cross-track error, 8 and 9 as indicator penalties, and 0 coefficients for weighting. Variants employ squared cross-track penalties or exponential collision penalties for sharper signal shaping: 1 Reward terms are compatible with both RL and optimal control approaches. Notably, progress-based reward is essential to avoid stationary “drifting” behaviors, while off-track/collision penalties rapidly guide agents towards viable policies (Charles et al., 18 Jun 2025, Shi, 2021).
4. Simulation-to-Reality Strategies
F1TENTH Gym incorporates several Sim2Real mechanisms to enhance policy robustness and transferability:
- Domain Randomization: Vehicle mass, tire friction coefficient 2, steering latency, sensor pose, and visual textures can be perturbed per episode.
- Sensor Noise: Additive Gaussian noise on LiDAR (i.i.d. 3), odometry bias/drift, IMU noise.
- Iterative Reality Adaptation: Datasets of real trajectories enable domain adaptation of perception modules and policy fine-tuning after simulation pre-training.
This integration enables the F1TENTH Gym to serve as a reproducible bridge between theoretical RL/control development and deployment on physical robotic platforms (Charles et al., 18 Jun 2025).
5. Benchmarks, Evaluation Protocols, and Metrics
Evaluation adheres to methodological rigor through:
- Metrics: Lap time 4, completion fraction 5, collision count 6, minimum LiDAR range 7, and average cross-track error 8.
- Protocols: Run 9 episodes per policy, report mean 0 standard deviation for all metrics, and compare against canonical controllers (e.g., Pure Pursuit, Stanley, PID).
- Reproducibility: The
infodictionary from each episode provides all primary quantitative results.
Empirical evaluations often mirror those adopted in international F1TENTH competitions and comparative RL/control studies, ensuring methodological alignment across research groups (Charles et al., 18 Jun 2025, Shi, 2021).
6. Policy-Gradient RL Integration and Transfer
Policy-gradient RL serves as a typical learning paradigm in F1TENTH Gym, with neural policy architectures parameterized as Gaussians: 1 where input 2 consists of 3 or richer state/observation vectors. Actor–Critic, baseline subtraction, and batch-based or episodic optimization are common. Network outputs are scaled to vehicle actuation ranges; e.g., 4 for steering (Shi, 2021).
Empirical findings indicate that:
- Actor–Critic methods outperform pure Monte Carlo PG by reducing learning variance.
- Direct zero-shot policy transfer from related domains (e.g., CartPole–F1TENTH mapping: 5) is ineffective; target-domain fine-tuning is required.
- Policy robustness is enhanced by adding small observation/action noise and enforcing early penalties on off-track infractions.
7. Example Usage and Practical Recommendations
A canonical usage protocol proceeds as follows:
6
With ROS extension, episodes are managed via launch files and metrics are retrieved from /gym/info. Practical advice includes clamping action outputs, using forward progress as a reward shaping term, initializing critic networks prior to policy updates, and carefully transferring policies between domains (e.g., CartPole to F1TENTH) with appropriate mapping and subsequent adaptation (Charles et al., 18 Jun 2025, Shi, 2021).