CARLA: Open Urban Driving Simulator
- The simulator offers a high-fidelity urban environment via a real-time client-server model built on Unreal Engine 4.
- Its flexible sensor suite includes RGB, depth, and semantic segmentation sensors for comprehensive perception testing.
- CARLA standardizes autonomous driving evaluation with reproducible benchmarks, detailed performance metrics, and diverse traffic scenarios.
CARLA (“Car Learning to Act”) is an open-source urban driving simulator designed for advancing research, development, and validation of autonomous driving systems. Constructed atop Unreal Engine 4 (UE4), CARLA provides a comprehensive research platform encompassing a real-time simulation engine, open-access digital assets, flexible sensor interfaces, environmental manipulation, and standardized benchmarking protocols. The platform supports both the evaluation of modular pipelines and end-to-end learning approaches under controlled, reproducible urban traffic scenarios and weather conditions (Dosovitskiy et al., 2017).
1. System Architecture and Design
CARLA implements a real-time, client–server model utilizing UE4 for high-fidelity simulation. The server (implemented in C++/Blueprint) hosts the virtual world, managing the physical environment, rendering, agent (vehicle and pedestrian) behavior, scenario scripting, and sensor emulation. The client is a lightweight Python API that interfaces with the server via TCP sockets, enabling command/control functions and data logging.
Key architectural roles:
- Physics Engine: Utilizes NVIDIA PhysXVehicles for vehicle dynamics, including collision detection and pedestrian kinematics.
- Rendering Engine: Employs customized UE4 assets emphasizing real-time performance (low-poly meshes, custom materials).
- Traffic Manager: Rule-based traffic entity controller, handling lane adherence, speed limits, light signals, route selection, and collision avoidance.
- Scenario Runner: Manages meta-commands (reset, time/weather, NPC density/seeding, spawn logic).
- Sensor Manager: Configures, positions, and simulates sensor data (camera, depth, semantic, measurements).
The client stack comprises an API façade (for vehicle and scenario control) and data consumers (for sensor streams and environment measurements).
2. Digital Assets and Simulation Environment
CARLA ships with two detailed urban maps:
- Town 1: 2.9 km of drivable roads; used for agent training.
- Town 2: 1.4 km of drivable roads; employed for generalization testing.
Static assets include 40 uniquely designed building models, composable road segments, sidewalks, vegetation, and diverse traffic infrastructure props. Dynamic assets consist of 16 animated vehicle models and 50 distinct pedestrian rigs.
Maps are organized as bespoke UE4 .umap files. Asset placement combines manual design with scripted road generation via spline tools. Designers specify spawn volumes to localize potential NPC initiation sites. The asset library is extensible; new content can be integrated through the UE4 project’s content browser and incorporated into scenario logic.
Environmental realism is achieved through:
- Lighting/Time: Two times of day (midday, sunset), which affect direct illumination and ambient occlusion.
- Weather Presets: Nine combinations of cloud cover, precipitation, and puddle density; yielding 18 distinct environmental states (parameterized by a “weather_id” meta-command). Environmental changes dynamically alter: directional lighting, skybox state, fog density, rain particle emission, and water decal distribution.
3. Sensor Suite Specification
CARLA v1.0 supports the following virtual sensors:
- RGB Camera: "sensor.camera.rgb"
- Depth Camera: Outputs 24-bit linear depth (range up to 1 km)
- Semantic Segmentation: 12 discrete classes (road, lane-marking, traffic sign, sidewalk, fence, pole, wall, building, vegetation, vehicle, pedestrian, other)
- Measurements: Pseudo-sensors reporting GPS-style position, compass, vehicle speed (km/h), acceleration, collision events (cars/pedestrians/static), lane and sidewalk invasion metrics, traffic-light state, speed limits, and bounding boxes for all actors.
Each camera sensor is specified by image dimensions (image_size_x/y), field of view (fov), update rate (sensor_tick), and affine transform (location and rotation, relative to ego vehicle).
Example: Python API usage for instantiating and attaching an RGB camera with specified attributes:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import carla client = carla.Client('localhost', 2000) world = client.get_world() blueprint_library = world.get_blueprint_library() cam_bp = blueprint_library.find('sensor.camera.rgb') cam_bp.set_attribute('image_size_x', '800') cam_bp.set_attribute('image_size_y', '600') cam_bp.set_attribute('fov', '90') cam_bp.set_attribute('sensor_tick', '0.05') # 20 Hz vehicle = world.spawn_actor(vehicle_bp, spawn_transform) cam_transform = carla.Transform(carla.Location(x=1.3, z=2.5)) camera = world.spawn_actor(cam_bp, cam_transform, attach_to=vehicle) camera.listen(lambda image: image.save_to_disk('_out/%06d.png' % image.frame)) |
4. Scenario Generation and Traffic Management
CARLA enables rigorous scenario synthesis through reproducible NPC spawning and automated route planning.
- NPC Spawning: The client API provides meta-commands to specify
number_of_vehiclesandnumber_of_pedestrians, along with deterministic randomization via seed values (seed_vehicles,seed_pedestrians). Upon scenario reset, the Scenario Runner populates the town accordingly. - Route Planning: The road network is available as linked waypoints through the Python API:
1 2 3 |
map = world.get_map() wp = map.get_waypoint(vehicle.get_location()) next_wps = wp.next(2.0) # next waypoints 2 m ahead |
- Traffic Management: The internal Traffic Manager computes agent destinations using A*-derived topological routes, with behavioral customization possible (e.g., ignoring lights for specified vehicles). Manual or script-based agent overrides are supported.
5. Evaluation Protocols and Metrics
CARLA provides standardized evaluation schemas centered on urban navigation performance under increasingly complex conditions:
- Test Scenarios: Four canonical tasks:
- Straight (no obstacles)
- One intersection turn (no obstacles)
- Arbitrary navigation (no obstacles)
- Navigation with dynamic actors (cars & pedestrians)
Training/Test Regimes: Division between Town 1 (training) and Town 2 (testing), and between weather sets (4 for training, 2 unseen for testing generalization).
- Metrics:
- Success Rate: $S = (\text{# episodes reaching goal})/(\text{total episodes})$
- Collision Rate: , with as cumulative distance in km, collision count.
- Lane Invasion: Timesteps when footprint overlaps opposite lane by more than $0.3$, formally
Lane Invasion Rate - Sidewalk Invasion, Infractions/km: Analogously defined.
6. Comparative Evaluation of Autonomous Driving Approaches
CARLA benchmarks three main control paradigms under the above protocols:
a. Classic Modular Pipeline (MP)
- Perception: Semantic segmentation via RefineNet (ImageNet-pretrained ResNet) into five key classes, binary intersection detection via AlexNet.
- Planning: Rule-based state machine (road-following, intersection navigation, hazard stops) leveraging segmentation and topology.
- Control: PID-based cruise controller set to 20 km/h.
- Trade-off: Robust within training domain but susceptible to perception failures under unfamiliar textures and weather; liable to mode-switching failures.
b. End-to-End Imitation Learning (IL)
- Method: Conditional imitation learning (as in [Codevilla et al. 2017]).
- Architecture: Perceptual CNN (input 200×88×3 → 512-dim), measurement head for speed, merged to 512-dim, four output branches (steer, throttle, brake) conditioned on high-level command.
- Training Data: ~14 h of traces (80% from automated agent, 20% from human, with noise injection for robustness).
- Optimization: Adam (initial lr=2×10−4 halved every 50k steps), dropout in both FC (50%) and conv (20%), aggressive augmentation; 294k training iterations minimizing MSE on controls.
c. End-to-End Reinforcement Learning (RL)
- Algorithm: A3C, 10 parallel actors, totaling 10M simulation steps.
- Network: Two stacked 84×84 grayscale CNN frames + FC measurement vector, with separate policy/value heads.
- Reward Function:
where = distance-to-goal, =speed, =collision damage, =sidewalk overlap, =opposite-lane overlap.
7. Results and Platform Extensibility
Experimental Results
Success rates (%) for each control approach are summarized below (averaged over 25 episodes):
| Task | Town 1 Train | Town 2 | Train+New Weather | Town 2+New Weather |
|---|---|---|---|---|
| Straight (200 m) | MP 98 IL 95 RL 89 | MP 92 IL 97 RL 74 | MP 100 IL 98 RL 86 | MP 50 IL 80 RL 68 |
| One turn (400 m) | MP 82 IL 89 RL 34 | MP 61 IL 59 RL 12 | MP 95 IL 90 RL 16 | MP 50 IL 48 RL 20 |
| Navigation (770 m) | MP 80 IL 86 RL 14 | MP 24 IL 40 RL 3 | MP 94 IL 84 RL 2 | MP 47 IL 44 RL 6 |
| Navigation+dynamic obs. | MP 77 IL 83 RL 7 | MP 24 IL 38 RL 2 | MP 89 IL 82 RL 2 | MP 44 IL 42 RL 4 |
Infractions (km driven between events) on Navigation+dynamic in Town 1, Train Weather:
| Infraction | MP | IL | RL |
|---|---|---|---|
| Opposite lane | 10.2 | 33.4 | 0.18 |
| Sidewalk | 18.3 | 12.9 | 0.75 |
| Collision–static | 10.0 | 5.4 | 0.42 |
| Collision–car | 16.4 | 3.3 | 0.58 |
| Collision–pedestrian | 18.9 | 6.4 | 17.8 |
Key insights include comparable MP and IL performance in-domain, substantial RL underperformance, demonstrably greater difficulty generalizing to unseen urban layouts (Town 2) than novel weather, and distinct performance-brittleness trade-offs by architecture.
Extensibility and Community Usage
- Installation/Execution: Open-source code (https://github.com/carla-simulator/carla), supporting local builds (UE4.24+), Python API (pip-installable), headless cloud execution via Docker/Xvfb, and sample scripts (e.g.,
manual_control.py). - Community Extensions: Actively developed ROS bridge; LiDAR plugin (“sensor.lidar.ray_cast” 32-beam emulator); OpenDRIVE importer (real-world network conversion); repository for high-level scenario definitions (e.g., cut-in, jaywalk, etc.).
CARLA’s open architecture, exhaustive sensor capabilities, town and weather variability, integrated traffic/simulation management, and reproducible protocolization establish it as a foundational research simulator for autonomous urban driving. Its capacity to standardize experiments across contrasting system architectures enables rigorous benchmarking and iterative improvement both within and across research groups (Dosovitskiy et al., 2017).