Papers
Topics
Authors
Recent
Search
2000 character limit reached

F1TENTH Gym: Autonomous Racing Simulator

Updated 24 June 2026
  • F1TENTH Gym is a modular simulation environment designed for autonomous racing research, integrating both non-ROS and ROS-compatible workflows.
  • It implements a discrete-time kinematic bicycle model with composite reward formulations and domain randomization to enhance policy robustness.
  • The platform features rigorous evaluation protocols and supports policy-gradient RL methods to bridge simulation development with real-world deployment.

F1TENTH Gym is a modular, OpenAI Gym-compatible simulation environment designed for research in autonomous racing, specifically tailored to the F1TENTH autonomous vehicle platform. It enables the study of high-speed perception, planning, and control algorithms in a reproducible, scalable, and simulator-agnostic manner. The environment provides interfaces for both non-ROS and ROS-based workflows, supports rigorous simulation-to-reality (Sim2Real) strategies, exposes a kinematic bicycle model for agent training, and includes evaluation protocols aligned with the requirements of competitive and academic autonomous racing research (Charles et al., 18 Jun 2025, Shi, 2021).

1. Simulation Environment and API

F1TENTH Gym is distributed as “f1tenth_gym” (Python/NumPy backend, no ROS required) and “f1tenth_gym_ros” (ROS-wrapped, containerized). The API follows standard Gym conventions:

  • Observation space: A continuous Box vector comprising LiDAR scan (N beams), ego pose [x,y,θ][x, y, \theta], velocity vv, and optionally distance/bearing to next waypoint. Example: obsRN+4\text{obs} \in \mathbb{R}^{N+4}.
  • Action space: Continuous vector [av,δ][a_v, \delta], where ava_v is longitudinal acceleration (throttle/brake) and δ\delta is steering command in radians.
  • Episode logic: Reset provides initial observation; step takes action, returns (obst+1,rt,done,info)(\text{obs}_{t+1}, r_t, \text{done}, \text{info}). Episodes terminate on collision, off-track, reaching max steps, or lap completion.
  • ROS integration: Topics /scan, /cmd_vel, /odom for sensor/command exchange; episode metrics reported via /gym/info.

These design choices facilitate rapid prototyping and seamless transfer from simulation to the physical F1TENTH platform (Charles et al., 18 Jun 2025, Shi, 2021).

2. Vehicle Dynamics Model

The simulator is grounded in a discrete-time kinematic bicycle model: x˙=vcos(θ),y˙=vsin(θ),θ˙=vLtan(δ),v˙=av\dot x = v \cos(\theta), \quad \dot y = v \sin(\theta), \quad \dot \theta = \frac{v}{L}\tan(\delta), \quad \dot v = a_v where (x,y)(x, y) denote world-frame position, θ\theta heading, vv0 wheelbase, vv1 longitudinal speed, vv2 steering angle, and vv3 longitudinal acceleration. An optional first-order lag on steering actuation is supported: vv4 This abstraction aligns with widely-adopted mathematical and software conventions for both model-based and RL-based autonomous vehicle research (Charles et al., 18 Jun 2025, Shi, 2021).

3. Reward Formulations

Reward design follows composite schemes standard in autonomous racing: vv5 with vv6 as forward path progress, vv7 cross-track error, vv8 and vv9 as indicator penalties, and obsRN+4\text{obs} \in \mathbb{R}^{N+4}0 coefficients for weighting. Variants employ squared cross-track penalties or exponential collision penalties for sharper signal shaping: obsRN+4\text{obs} \in \mathbb{R}^{N+4}1 Reward terms are compatible with both RL and optimal control approaches. Notably, progress-based reward is essential to avoid stationary “drifting” behaviors, while off-track/collision penalties rapidly guide agents towards viable policies (Charles et al., 18 Jun 2025, Shi, 2021).

4. Simulation-to-Reality Strategies

F1TENTH Gym incorporates several Sim2Real mechanisms to enhance policy robustness and transferability:

  • Domain Randomization: Vehicle mass, tire friction coefficient obsRN+4\text{obs} \in \mathbb{R}^{N+4}2, steering latency, sensor pose, and visual textures can be perturbed per episode.
  • Sensor Noise: Additive Gaussian noise on LiDAR (i.i.d. obsRN+4\text{obs} \in \mathbb{R}^{N+4}3), odometry bias/drift, IMU noise.
  • Iterative Reality Adaptation: Datasets of real trajectories enable domain adaptation of perception modules and policy fine-tuning after simulation pre-training.

This integration enables the F1TENTH Gym to serve as a reproducible bridge between theoretical RL/control development and deployment on physical robotic platforms (Charles et al., 18 Jun 2025).

5. Benchmarks, Evaluation Protocols, and Metrics

Evaluation adheres to methodological rigor through:

  • Metrics: Lap time obsRN+4\text{obs} \in \mathbb{R}^{N+4}4, completion fraction obsRN+4\text{obs} \in \mathbb{R}^{N+4}5, collision count obsRN+4\text{obs} \in \mathbb{R}^{N+4}6, minimum LiDAR range obsRN+4\text{obs} \in \mathbb{R}^{N+4}7, and average cross-track error obsRN+4\text{obs} \in \mathbb{R}^{N+4}8.
  • Protocols: Run obsRN+4\text{obs} \in \mathbb{R}^{N+4}9 episodes per policy, report mean [av,δ][a_v, \delta]0 standard deviation for all metrics, and compare against canonical controllers (e.g., Pure Pursuit, Stanley, PID).
  • Reproducibility: The info dictionary from each episode provides all primary quantitative results.

Empirical evaluations often mirror those adopted in international F1TENTH competitions and comparative RL/control studies, ensuring methodological alignment across research groups (Charles et al., 18 Jun 2025, Shi, 2021).

6. Policy-Gradient RL Integration and Transfer

Policy-gradient RL serves as a typical learning paradigm in F1TENTH Gym, with neural policy architectures parameterized as Gaussians: [av,δ][a_v, \delta]1 where input [av,δ][a_v, \delta]2 consists of [av,δ][a_v, \delta]3 or richer state/observation vectors. Actor–Critic, baseline subtraction, and batch-based or episodic optimization are common. Network outputs are scaled to vehicle actuation ranges; e.g., [av,δ][a_v, \delta]4 for steering (Shi, 2021).

Empirical findings indicate that:

  • Actor–Critic methods outperform pure Monte Carlo PG by reducing learning variance.
  • Direct zero-shot policy transfer from related domains (e.g., CartPole–F1TENTH mapping: [av,δ][a_v, \delta]5) is ineffective; target-domain fine-tuning is required.
  • Policy robustness is enhanced by adding small observation/action noise and enforcing early penalties on off-track infractions.

7. Example Usage and Practical Recommendations

A canonical usage protocol proceeds as follows: [av,δ][a_v, \delta]6 With ROS extension, episodes are managed via launch files and metrics are retrieved from /gym/info. Practical advice includes clamping action outputs, using forward progress as a reward shaping term, initializing critic networks prior to policy updates, and carefully transferring policies between domains (e.g., CartPole to F1TENTH) with appropriate mapping and subsequent adaptation (Charles et al., 18 Jun 2025, Shi, 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to F1TENTH Gym.