CARLA-Air Simulation Platform

Updated 1 April 2026

CARLA-Air is an open-source simulation infrastructure that unifies CARLA urban driving and AirSim multirotor aerial robotics within a single Unreal Engine 4 process.
It features a single physics-renderer loop that synchronizes up to 18 sensor modalities, ensuring spatial and temporal consistency in dynamic, photorealistic environments.
The platform supports diverse research applications with native Python APIs, planned ROS 2 interfaces, and an extensible asset pipeline for air-ground cooperative experiments.

CARLA-Air is an open-source infrastructure that enables the unified, high-fidelity simulation of both urban driving and multirotor aerial robotics within a single Unreal Engine 4 (UE4) process. It merges the CARLA urban driving simulator with the AirSim multirotor flight stack, preserving both platforms’ native Python APIs alongside planned ROS 2 interfaces. By eschewing bridge-based co-simulation and instead implementing a joint simulation and sensor pipeline inside one physics-renderer loop, CARLA-Air provides spatially and temporally coherent environments where ground vehicles, pedestrians, and unmanned aerial vehicles (UAVs) interact under consistent physics and photorealistic rendering. The system supports up to 18 synchronized sensor modalities per tick and exposes extensible workflows for air-ground cooperative embodied intelligence research, including dataset generation, vision-language action, multi-modal perception, and reinforcement learning. CARLA-Air is distributed with prebuilt binaries and full source at https://github.com/louiszengCN/CarlaAir (Zeng et al., 30 Mar 2026).

1. Unified System Architecture

CARLA-Air achieves a single-process integration of CARLA and AirSim by introducing the CARLAAirGameMode class within a single UE4 world. This configuration inherits all CARLA ground-simulation subsystems (traffic, weather) and incorporates AirSim’s FlightPawn as a regular UE4 actor during BeginPlay, avoiding game-mode conflicts. Two independent RPC servers (CARLA on TCP 2000, AirSim on TCP 41451) operate concurrently in the same process, preserving native APIs for code reuse.

High-Level Block Diagram

$I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 5

CARLA-Air leverages UE4’s enforcement of a single physics-renderer loop per world: both ground (vehicles/pedestrians) and aerial (multirotor UAV) dynamics advance in lock-step each simulation tick. Default rendering tick is 20 Hz, while drone physics are computed at ~1 kHz internally.

2. Physics Models and Synchronization

CARLA-Air inherits AirSim’s multirotor model governed by the Newton–Euler 6-DOF equations:

$m\cdot\ddot{x} = F_\mathrm{thrust} + F_\mathrm{gravity} + F_\mathrm{aero}$
$I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$

Component force and torque models:

$F_\mathrm{thrust} = \sum_i k_F \omega_i^2 \cdot e_i$ (thrust per rotor)
$F_\mathrm{gravity} = mg(-\hat{z})$
$F_\mathrm{aero} = -k_D v\|v\|$ (quadratic drag)
$\tau_\mathrm{thrust} = [0, 0, \sum_i (-1)^i k_M \omega_i^2]^T$ (net yaw torque)
$\tau_\mathrm{aero}$ incorporates aerodynamic damping.

The FlightPawn actor's physics thread runs at ~1 kHz and updates the pawn’s pose for each UE4 tick $\Delta t$ (e.g., 0.05 s), while ground and aerial physics share access to the world state, ensuring all actor dynamics are co-evolved. All sensors (ground and aerial) sample synchronously per UE4 tick, providing spatial-temporal consistency with $|t_k^\mathrm{gnd} - t_k^\mathrm{air}| = 0$ for all frame indices $k$ .

3. Rendering and Multimodal Sensor Suite

CARLA-Air utilizes a unified photorealistic rendering pipeline based on CARLA’s UE4 material and lighting stack. All ground and aerial sensors, irrespective of perspective or physical embodiment, operate under a consistent renderer, inheriting shared weather, lighting, and post-process effects.

Supported sensor modalities (up to 18 concurrent streams) include:

RGB camera (customizable FOV, resolution, lens parameters)
Depth camera (linear/logarithmic modes)
Semantic segmentation (per-pixel class)
Instance segmentation
Surface normals
LiDAR (spinning, adjustable beam count, configurable range and noise)
Radar (range-Doppler)
IMU (accelerometer, gyroscope with tunable noise/bias)
Barometer
GNSS
Collision detector
Others by extension

All sensors sample on each UE4 tick (default 20 Hz), with Gaussian noise and bias added to raw outputs, e.g., for the IMU:

$I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 0

Sensor acquisition pseudocode: $I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 6 Sampling drive, data output, and all sensor readings are precisely synchronized (max deviation ≤ 1 tick) across all platforms.

4. Programming Interfaces and Extensibility

CARLA-Air preserves the native function signatures and APIs of both CARLA and AirSim. Python-based workflows require no code modification; RPC endpoints remain TCP 2000 (CARLA) and TCP 41451 (AirSim). The planned ROS 2 bridge will publish under standard /carla/* and /airsim/* namespaces.

Example Python interaction: $I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 7

The asset import pipeline supports arbitrary custom UE4 assets (FBX/OBJ). Users can introduce new actors by:

Importing meshes into Content/CustomAssets,
Defining a Blueprint subclass (Actor or Pawn),
Adding physics root, collision, and inertia,
Attaching CARLA-Air sensor components (CARLASensor class),
Packaging the plugin, enabling access via world.get_blueprint_library().filter("custom.*").

All custom assets share physics, rendering, and sensing ticks with native actors.

5. Performance Metrics and Temporal Consistency

Performance characterization (Town10HD, Epic quality, up to 8 × 720p sensor streams):

Profile	FPS	VRAM (MiB)	CPU (%)
Ground only (8×720p)	26.3 ± 1.4	3831 ± 11	38 ± 4
Aerial only (8×720p)	44.7 ± 2.1	2941 ± 8	29 ± 3
Moderate joint (+1 UAV)	19.8 ± 1.1	3870 ± 13	54 ± 5

Memory stability over 3-hour, 357-reset testing: VRAM drift < 10 MiB; regression slope 0.49 MiB/reset ( $I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 1). Zero crashes or API errors reported.
Communication latency (loopback): CARLA snapshot 320 μs; AirSim state query 410 μs; AirSim 720p image capture 3.2 ms, bridge-based co-simulation reference 3–5 ms.
Strict spatial-temporal alignment: All ground and aerial sensors share UE4 tick index; timestamp misalignment $I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 2 (bridge-based approaches exhibit $I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 3 and multi-ms jitter).

6. Representative Workloads and Research Directions

CARLA-Air is tested and released with the following “out-of-the-box” scenarios, supporting four primary research verticals:

Cooperative Precision Landing: UAV lands on a moving CARLA vehicle with a control loop at ~20 Hz, yielding touchdown error < 0.5 m in ~20 s from 12 m altitude.
Vision-Language Navigation/Action Data Generation: Vehicles and drones equipped with RGB+depth+semantic cameras traverse autopilot routes, enabling paired dataset collection and language-instruction synthesis based on the drone’s overhead view.
Synchronized Multi-Modal Dataset Collection: Example experiment with 8 ground and 4 aerial sensors over 1,000 ticks achieves ~17 Hz multi-stream collection, with max frame alignment deviation ≤ 1 tick.
Cross-View Perception in Varying Weather: Drone hovers above a vehicle, synchronously capturing aerial depth and ground segmentation through 14 weather presets, demonstrating accurate pixel-intensity tracking and zero temporal misalignment.
Reinforcement Learning Environments: Trains a drone to maintain spatial offset relative to a moving car in dense traffic, using state observations and a reward function

$I\cdot\dot{\omega} = \tau_\mathrm{thrust} + \tau_\mathrm{aero}$ 4

System maintained stable resets (357 cycles, 0 crashes).

7. Asset Pipeline and Customization

The extensible asset pipeline supports arbitrary custom robot platform integration into the unified simulation. Standard procedure involves adding asset folders in Unreal Editor, importing meshes, defining Blueprints, assigning physics and sensors, and packaging for Python API access. All assets, whether imported or native, participate in the shared physics, sensing, and rendering loops without additional synchronization requirements.

Summary: CARLA-Air provides a single-process, high-fidelity, photorealistic simulation environment where cars, pedestrians, and drones coexist in a unified UE4 world with perfectly synchronized physics and sensor modalities. It preserves code compatibility with standard CARLA and AirSim APIs (with ongoing ROS 2 integration) and supports a range of air-ground embodied intelligence investigations, from cooperative landing to vision-language navigation, multi-modal perception, and reinforcement learning. The asset pipeline and custom actor support enable rapid adaptation for diverse embodied research applications (Zeng et al., 30 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (1)

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CARLA-Air.