F1TENTH Gym: Autonomous Racing Simulator

Updated 24 June 2026

F1TENTH Gym is a modular simulation environment designed for autonomous racing research, integrating both non-ROS and ROS-compatible workflows.
It implements a discrete-time kinematic bicycle model with composite reward formulations and domain randomization to enhance policy robustness.
The platform features rigorous evaluation protocols and supports policy-gradient RL methods to bridge simulation development with real-world deployment.

F1TENTH Gym is a modular, OpenAI Gym-compatible simulation environment designed for research in autonomous racing, specifically tailored to the F1TENTH autonomous vehicle platform. It enables the study of high-speed perception, planning, and control algorithms in a reproducible, scalable, and simulator-agnostic manner. The environment provides interfaces for both non-ROS and ROS-based workflows, supports rigorous simulation-to-reality (Sim2Real) strategies, exposes a kinematic bicycle model for agent training, and includes evaluation protocols aligned with the requirements of competitive and academic autonomous racing research (Charles et al., 18 Jun 2025, Shi, 2021).

1. Simulation Environment and API

F1TENTH Gym is distributed as “f1tenth_gym” (Python/NumPy backend, no ROS required) and “f1tenth_gym_ros” (ROS-wrapped, containerized). The API follows standard Gym conventions:

Observation space: A continuous Box vector comprising LiDAR scan (N beams), ego pose $[x, y, \theta]$ , velocity $v$ , and optionally distance/bearing to next waypoint. Example: $\text{obs} \in \mathbb{R}^{N+4}$ .
Action space: Continuous vector $[a_v, \delta]$ , where $a_v$ is longitudinal acceleration (throttle/brake) and $\delta$ is steering command in radians.
Episode logic: Reset provides initial observation; step takes action, returns $(\text{obs}_{t+1}, r_t, \text{done}, \text{info})$ . Episodes terminate on collision, off-track, reaching max steps, or lap completion.
ROS integration: Topics /scan, /cmd_vel, /odom for sensor/command exchange; episode metrics reported via /gym/info.

These design choices facilitate rapid prototyping and seamless transfer from simulation to the physical F1TENTH platform (Charles et al., 18 Jun 2025, Shi, 2021).

2. Vehicle Dynamics Model

The simulator is grounded in a discrete-time kinematic bicycle model: $\dot x = v \cos(\theta), \quad \dot y = v \sin(\theta), \quad \dot \theta = \frac{v}{L}\tan(\delta), \quad \dot v = a_v$ where $(x, y)$ denote world-frame position, $\theta$ heading, $v$ 0 wheelbase, $v$ 1 longitudinal speed, $v$ 2 steering angle, and $v$ 3 longitudinal acceleration. An optional first-order lag on steering actuation is supported: $v$ 4 This abstraction aligns with widely-adopted mathematical and software conventions for both model-based and RL-based autonomous vehicle research (Charles et al., 18 Jun 2025, Shi, 2021).

3. Reward Formulations

Reward design follows composite schemes standard in autonomous racing: $v$ 5 with $v$ 6 as forward path progress, $v$ 7 cross-track error, $v$ 8 and $v$ 9 as indicator penalties, and $\text{obs} \in \mathbb{R}^{N+4}$ 0 coefficients for weighting. Variants employ squared cross-track penalties or exponential collision penalties for sharper signal shaping: $\text{obs} \in \mathbb{R}^{N+4}$ 1 Reward terms are compatible with both RL and optimal control approaches. Notably, progress-based reward is essential to avoid stationary “drifting” behaviors, while off-track/collision penalties rapidly guide agents towards viable policies (Charles et al., 18 Jun 2025, Shi, 2021).

4. Simulation-to-Reality Strategies

F1TENTH Gym incorporates several Sim2Real mechanisms to enhance policy robustness and transferability:

Domain Randomization: Vehicle mass, tire friction coefficient $\text{obs} \in \mathbb{R}^{N+4}$ 2, steering latency, sensor pose, and visual textures can be perturbed per episode.
Sensor Noise: Additive Gaussian noise on LiDAR (i.i.d. $\text{obs} \in \mathbb{R}^{N+4}$ 3), odometry bias/drift, IMU noise.
Iterative Reality Adaptation: Datasets of real trajectories enable domain adaptation of perception modules and policy fine-tuning after simulation pre-training.

This integration enables the F1TENTH Gym to serve as a reproducible bridge between theoretical RL/control development and deployment on physical robotic platforms (Charles et al., 18 Jun 2025).

5. Benchmarks, Evaluation Protocols, and Metrics

Evaluation adheres to methodological rigor through:

Metrics: Lap time $\text{obs} \in \mathbb{R}^{N+4}$ 4, completion fraction $\text{obs} \in \mathbb{R}^{N+4}$ 5, collision count $\text{obs} \in \mathbb{R}^{N+4}$ 6, minimum LiDAR range $\text{obs} \in \mathbb{R}^{N+4}$ 7, and average cross-track error $\text{obs} \in \mathbb{R}^{N+4}$ 8.
Protocols: Run $\text{obs} \in \mathbb{R}^{N+4}$ 9 episodes per policy, report mean $[a_v, \delta]$ 0 standard deviation for all metrics, and compare against canonical controllers (e.g., Pure Pursuit, Stanley, PID).
Reproducibility: The info dictionary from each episode provides all primary quantitative results.

Empirical evaluations often mirror those adopted in international F1TENTH competitions and comparative RL/control studies, ensuring methodological alignment across research groups (Charles et al., 18 Jun 2025, Shi, 2021).

6. Policy-Gradient RL Integration and Transfer

Policy-gradient RL serves as a typical learning paradigm in F1TENTH Gym, with neural policy architectures parameterized as Gaussians: $[a_v, \delta]$ 1 where input $[a_v, \delta]$ 2 consists of $[a_v, \delta]$ 3 or richer state/observation vectors. Actor–Critic, baseline subtraction, and batch-based or episodic optimization are common. Network outputs are scaled to vehicle actuation ranges; e.g., $[a_v, \delta]$ 4 for steering (Shi, 2021).

Empirical findings indicate that:

Actor–Critic methods outperform pure Monte Carlo PG by reducing learning variance.
Direct zero-shot policy transfer from related domains (e.g., CartPole–F1TENTH mapping: $[a_v, \delta]$ 5) is ineffective; target-domain fine-tuning is required.
Policy robustness is enhanced by adding small observation/action noise and enforcing early penalties on off-track infractions.

7. Example Usage and Practical Recommendations

A canonical usage protocol proceeds as follows: $[a_v, \delta]$ 6 With ROS extension, episodes are managed via launch files and metrics are retrieved from /gym/info. Practical advice includes clamping action outputs, using forward progress as a reward shaping term, initializing critic networks prior to policy updates, and carefully transferring policies between domains (e.g., CartPole to F1TENTH) with appropriate mapping and subsequent adaptation (Charles et al., 18 Jun 2025, Shi, 2021).

Markdown Report Issue Upgrade to Chat

References (2)

Advancing Autonomous Racing: A Comprehensive Survey of the RoboRacer (F1TENTH) Platform (2025)

Gradient Policy on "CartPole" game and its' expansibility to F1Tenth Autonomous Vehicles (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to F1TENTH Gym.

F1TENTH Gym: Autonomous Racing Simulator

1. Simulation Environment and API

2. Vehicle Dynamics Model

3. Reward Formulations

4. Simulation-to-Reality Strategies

5. Benchmarks, Evaluation Protocols, and Metrics

6. Policy-Gradient RL Integration and Transfer

7. Example Usage and Practical Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

F1TENTH Gym: Autonomous Racing Simulator

1. Simulation Environment and API

2. Vehicle Dynamics Model

3. Reward Formulations

4. Simulation-to-Reality Strategies

5. Benchmarks, Evaluation Protocols, and Metrics

6. Policy-Gradient RL Integration and Transfer

7. Example Usage and Practical Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research