Papers
Topics
Authors
Recent
Search
2000 character limit reached

CARLA: Open Urban Driving Simulator

Updated 26 February 2026
  • The simulator offers a high-fidelity urban environment via a real-time client-server model built on Unreal Engine 4.
  • Its flexible sensor suite includes RGB, depth, and semantic segmentation sensors for comprehensive perception testing.
  • CARLA standardizes autonomous driving evaluation with reproducible benchmarks, detailed performance metrics, and diverse traffic scenarios.

CARLA (“Car Learning to Act”) is an open-source urban driving simulator designed for advancing research, development, and validation of autonomous driving systems. Constructed atop Unreal Engine 4 (UE4), CARLA provides a comprehensive research platform encompassing a real-time simulation engine, open-access digital assets, flexible sensor interfaces, environmental manipulation, and standardized benchmarking protocols. The platform supports both the evaluation of modular pipelines and end-to-end learning approaches under controlled, reproducible urban traffic scenarios and weather conditions (Dosovitskiy et al., 2017).

1. System Architecture and Design

CARLA implements a real-time, client–server model utilizing UE4 for high-fidelity simulation. The server (implemented in C++/Blueprint) hosts the virtual world, managing the physical environment, rendering, agent (vehicle and pedestrian) behavior, scenario scripting, and sensor emulation. The client is a lightweight Python API that interfaces with the server via TCP sockets, enabling command/control functions and data logging.

Key architectural roles:

  • Physics Engine: Utilizes NVIDIA PhysXVehicles for vehicle dynamics, including collision detection and pedestrian kinematics.
  • Rendering Engine: Employs customized UE4 assets emphasizing real-time performance (low-poly meshes, custom materials).
  • Traffic Manager: Rule-based traffic entity controller, handling lane adherence, speed limits, light signals, route selection, and collision avoidance.
  • Scenario Runner: Manages meta-commands (reset, time/weather, NPC density/seeding, spawn logic).
  • Sensor Manager: Configures, positions, and simulates sensor data (camera, depth, semantic, measurements).

The client stack comprises an API façade (for vehicle and scenario control) and data consumers (for sensor streams and environment measurements).

2. Digital Assets and Simulation Environment

CARLA ships with two detailed urban maps:

  • Town 1: 2.9 km of drivable roads; used for agent training.
  • Town 2: 1.4 km of drivable roads; employed for generalization testing.

Static assets include 40 uniquely designed building models, composable road segments, sidewalks, vegetation, and diverse traffic infrastructure props. Dynamic assets consist of 16 animated vehicle models and 50 distinct pedestrian rigs.

Maps are organized as bespoke UE4 .umap files. Asset placement combines manual design with scripted road generation via spline tools. Designers specify spawn volumes to localize potential NPC initiation sites. The asset library is extensible; new content can be integrated through the UE4 project’s content browser and incorporated into scenario logic.

Environmental realism is achieved through:

  • Lighting/Time: Two times of day (midday, sunset), which affect direct illumination and ambient occlusion.
  • Weather Presets: Nine combinations of cloud cover, precipitation, and puddle density; yielding 18 distinct environmental states (parameterized by a “weather_id” meta-command). Environmental changes dynamically alter: directional lighting, skybox state, fog density, rain particle emission, and water decal distribution.

3. Sensor Suite Specification

CARLA v1.0 supports the following virtual sensors:

  • RGB Camera: "sensor.camera.rgb"
  • Depth Camera: Outputs 24-bit linear depth (range up to 1 km)
  • Semantic Segmentation: 12 discrete classes (road, lane-marking, traffic sign, sidewalk, fence, pole, wall, building, vegetation, vehicle, pedestrian, other)
  • Measurements: Pseudo-sensors reporting GPS-style position, compass, vehicle speed (km/h), acceleration, collision events (cars/pedestrians/static), lane and sidewalk invasion metrics, traffic-light state, speed limits, and bounding boxes for all actors.

Each camera sensor is specified by image dimensions (image_size_x/y), field of view (fov), update rate (sensor_tick), and affine transform (location and rotation, relative to ego vehicle).

Example: Python API usage for instantiating and attaching an RGB camera with specified attributes:

1
2
3
4
5
6
7
8
9
10
11
12
13
import carla
client = carla.Client('localhost', 2000)
world = client.get_world()
blueprint_library = world.get_blueprint_library()
cam_bp = blueprint_library.find('sensor.camera.rgb')
cam_bp.set_attribute('image_size_x', '800')
cam_bp.set_attribute('image_size_y', '600')
cam_bp.set_attribute('fov', '90')
cam_bp.set_attribute('sensor_tick', '0.05')  # 20 Hz
vehicle = world.spawn_actor(vehicle_bp, spawn_transform)
cam_transform = carla.Transform(carla.Location(x=1.3, z=2.5))
camera = world.spawn_actor(cam_bp, cam_transform, attach_to=vehicle)
camera.listen(lambda image: image.save_to_disk('_out/%06d.png' % image.frame))

4. Scenario Generation and Traffic Management

CARLA enables rigorous scenario synthesis through reproducible NPC spawning and automated route planning.

  • NPC Spawning: The client API provides meta-commands to specify number_of_vehicles and number_of_pedestrians, along with deterministic randomization via seed values (seed_vehicles, seed_pedestrians). Upon scenario reset, the Scenario Runner populates the town accordingly.
  • Route Planning: The road network is available as linked waypoints through the Python API:

1
2
3
map = world.get_map()
wp = map.get_waypoint(vehicle.get_location())
next_wps = wp.next(2.0)  # next waypoints 2 m ahead

  • Traffic Management: The internal Traffic Manager computes agent destinations using A*-derived topological routes, with behavioral customization possible (e.g., ignoring lights for specified vehicles). Manual or script-based agent overrides are supported.

5. Evaluation Protocols and Metrics

CARLA provides standardized evaluation schemas centered on urban navigation performance under increasingly complex conditions:

  • Test Scenarios: Four canonical tasks:

    1. Straight (no obstacles)
    2. One intersection turn (no obstacles)
    3. Arbitrary navigation (no obstacles)
    4. Navigation with dynamic actors (cars & pedestrians)
  • Training/Test Regimes: Division between Town 1 (training) and Town 2 (testing), and between weather sets (4 for training, 2 unseen for testing generalization).

  • Metrics:

    • Success Rate: $S = (\text{# episodes reaching goal})/(\text{total episodes})$
    • Collision Rate: C=Ncoll/DC = N_{\text{coll}} / D, with DD as cumulative distance in km, NcollN_{\text{coll}} collision count.
    • Lane Invasion: Timesteps when footprint overlaps opposite lane by more than $0.3$, formally

    (t)={1if vehicle_overlap_opposite_lane(t)>0.3 0otherwise\ell(t) = \begin{cases} 1 & \text{if vehicle\_overlap\_opposite\_lane}(t) > 0.3 \ 0 & \text{otherwise} \end{cases}

    Lane Invasion Rate =(t(t))/(total timesteps)= (\sum_t \ell(t))/(\text{total timesteps}) - Sidewalk Invasion, Infractions/km: Analogously defined.

6. Comparative Evaluation of Autonomous Driving Approaches

CARLA benchmarks three main control paradigms under the above protocols:

a. Classic Modular Pipeline (MP)

  • Perception: Semantic segmentation via RefineNet (ImageNet-pretrained ResNet) into five key classes, binary intersection detection via AlexNet.
  • Planning: Rule-based state machine (road-following, intersection navigation, hazard stops) leveraging segmentation and topology.
  • Control: PID-based cruise controller set to 20 km/h.
  • Trade-off: Robust within training domain but susceptible to perception failures under unfamiliar textures and weather; liable to mode-switching failures.

b. End-to-End Imitation Learning (IL)

  • Method: Conditional imitation learning (as in [Codevilla et al. 2017]).
  • Architecture: Perceptual CNN (input 200×88×3 → 512-dim), measurement head for speed, merged to 512-dim, four output branches (steer, throttle, brake) conditioned on high-level command.
  • Training Data: ~14 h of traces (80% from automated agent, 20% from human, with noise injection for robustness).
  • Optimization: Adam (initial lr=2×10−4 halved every 50k steps), dropout in both FC (50%) and conv (20%), aggressive augmentation; 294k training iterations minimizing MSE on controls.

c. End-to-End Reinforcement Learning (RL)

  • Algorithm: A3C, 10 parallel actors, totaling 10M simulation steps.
  • Network: Two stacked 84×84 grayscale CNN frames + FC measurement vector, with separate policy/value heads.
  • Reward Function:

rt=1000(dt1dt)+0.05(vtvt1)0.00002(ctct1)2(stst1)2(otot1)r_t = 1000 \cdot (d_{t-1} - d_t) + 0.05(v_t - v_{t-1}) - 0.00002(c_t - c_{t-1}) - 2(s_t - s_{t-1}) - 2(o_t - o_{t-1})

where dd = distance-to-goal, vv=speed, cc=collision damage, ss=sidewalk overlap, oo=opposite-lane overlap.

7. Results and Platform Extensibility

Experimental Results

Success rates (%) for each control approach are summarized below (averaged over 25 episodes):

Task Town 1 Train Town 2 Train+New Weather Town 2+New Weather
Straight (200 m) MP 98 IL 95 RL 89 MP 92 IL 97 RL 74 MP 100 IL 98 RL 86 MP 50 IL 80 RL 68
One turn (400 m) MP 82 IL 89 RL 34 MP 61 IL 59 RL 12 MP 95 IL 90 RL 16 MP 50 IL 48 RL 20
Navigation (770 m) MP 80 IL 86 RL 14 MP 24 IL 40 RL 3 MP 94 IL 84 RL 2 MP 47 IL 44 RL 6
Navigation+dynamic obs. MP 77 IL 83 RL 7 MP 24 IL 38 RL 2 MP 89 IL 82 RL 2 MP 44 IL 42 RL 4

Infractions (km driven between events) on Navigation+dynamic in Town 1, Train Weather:

Infraction MP IL RL
Opposite lane 10.2 33.4 0.18
Sidewalk 18.3 12.9 0.75
Collision–static 10.0 5.4 0.42
Collision–car 16.4 3.3 0.58
Collision–pedestrian 18.9 6.4 17.8

Key insights include comparable MP and IL performance in-domain, substantial RL underperformance, demonstrably greater difficulty generalizing to unseen urban layouts (Town 2) than novel weather, and distinct performance-brittleness trade-offs by architecture.

Extensibility and Community Usage

  • Installation/Execution: Open-source code (https://github.com/carla-simulator/carla), supporting local builds (UE4.24+), Python API (pip-installable), headless cloud execution via Docker/Xvfb, and sample scripts (e.g., manual_control.py).
  • Community Extensions: Actively developed ROS bridge; LiDAR plugin (“sensor.lidar.ray_cast” 32-beam emulator); OpenDRIVE importer (real-world network conversion); repository for high-level scenario definitions (e.g., cut-in, jaywalk, etc.).

CARLA’s open architecture, exhaustive sensor capabilities, town and weather variability, integrated traffic/simulation management, and reproducible protocolization establish it as a foundational research simulator for autonomous urban driving. Its capacity to standardize experiments across contrasting system architectures enables rigorous benchmarking and iterative improvement both within and across research groups (Dosovitskiy et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CARLA: An Open Urban Driving Simulator.