CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Abstract: The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about
The paper introduces CARLAโAir, a free (openโsource) computer simulator that lets flying robots (drones) and ground robots/cars live and act together in the same realistic 3D city. It combines the strengths of two popular simulators:
- CARLA (great for cars, roads, traffic, and pedestrians)
- AirSim (great for accurate drone flight and drone sensors)
Instead of running them as two separate programs that โtalkโ to each other, CARLAโAir carefully puts them inside one shared world that runs on a single engine. That makes everything line up in space and time, which is crucial for robots that learn from synchronized camera, LiDAR, and other sensors.
What questions the authors asked
The authors set out to answer simple but important questions:
- Can we build one simulator where cars, pedestrians, and drones all move realistically in the same world, on the same clock?
- Can we keep the original programming interfaces (APIs) from CARLA and AirSim so researchers donโt have to rewrite their code?
- Can we capture many kinds of sensors (like cameras and laser scans) for both air and ground robots at exactly the same moments?
- Can we avoid the lag and confusion that happen when two separate simulators try to stay in sync?
- Will this run fast and stable enough to train and test robot behaviors for hours?
How they built it (in everyday terms)
Think of two clubs trying to use the same gym at the same time. Most people might try to split time or use walkieโtalkies to coordinate. Thatโs the โtwo separate simulators with a bridgeโ approachโitโs workable but clunky and often out of sync.
CARLAโAir instead makes one club the host (CARLA runs the city, roads, traffic, and pedestrians) and invites the other clubโs activity (AirSimโs drone flight) into the same gym as a normal participant.
Hereโs what that means in practice:
- One shared world and clock: Both the city and the drone run inside a single Unreal Engine process, so every camera frame and sensor reading happens at the same tick (moment in time). This keeps all data perfectly lined up.
- Keep existing code working: The original CARLA and AirSim Python/ROS 2 APIs still work. If you already have code for either, you can run it without changes.
- Solve โone boss onlyโ rule: Unreal Engine only allows one โgame modeโ (the boss that runs the world). CARLAโAir makes CARLA the boss, then spawns the drone flight system as a regular world actor. Result: both systems work together without fighting over control.
- Make coordinates match: Cars and drones used different ways of measuring direction and units (e.g., centimeters vs. meters; โupโ vs. โdownโ). CARLAโAir adds a simple conversion (flip the up/down axis and convert cm to m) so positions and orientations agree.
- Many sensor types, synchronized: Up to 18 kinds of sensors (RGB camera, depth, semantic segmentation, LiDAR, radar, IMU, GPSโlike GNSS, barometer, and more) can be recorded at the same tick for ground and air platforms.
- Easy to add your own robots and maps: Thereโs an asset import pipeline so you can bring custom drones, vehicles, or environments into the shared world.
- Careful testing: They measured speed (frames per second), memory use over time (to catch leaks), and how quickly commands and data roundโtrip between your script and the simulator (latency).
What they found and why it matters
- Smooth, unified airโground simulation: With typical city scenes, a drone, traffic, and multiple sensors, CARLAโAir runs around 20 frames per secondโfast enough for most training and testing loops.
- Stable for long runs: Over 3 hours of repeated resets (like in reinforcement learning), performance stayed steady with no crashes and no meaningful memory growth.
- Low communication delay: Simple queries return in well under a millisecond, and even image requests are fast enough to keep up with the simulation clockโmuch quicker than sending data between two separate programs.
- Better timing consistency: Because everything lives in one process, sensor data across drones and cars is tightly synchronized. Thatโs important when training AI models that depend on correctly aligned camera/LiDAR/GPS data.
- No code rewrites: Researchers can reuse existing CARLA or AirSim code, lowering the barrier to trying airโground tasks.
- Keeps AirSim alive: Since AirSimโs original development has slowed down, CARLAโAir gives that flight stack a modern, actively maintained home.
Why this is important
Having drones and ground robots share the same realistic city simulation opens doors to many useful projects:
- Airโground teamwork: For example, a drone scouting traffic while a ground robot responds, or cooperative searchโandโrescue.
- Better navigation and decisionโmaking: Combine the droneโs birdโsโeye view with groundโlevel details for smarter planning.
- Highโquality datasets: Collect perfectly matched sensor data from both views to improve 3D mapping, crossโview recognition, and scene understanding.
- Faster robot learning: Train policies with reinforcement learning in a safe, repeatable, and physically consistent world.
- Realโworld relevance: Helps develop tech for the โlowโaltitude economy,โ like urban air mobility, package delivery, and infrastructure inspection.
The bottom line
CARLAโAir fills a gap: itโs a single, easyโtoโuse, openโsource simulator where drones, cars, and people share one realistic world with synchronized sensors and preserved APIs. That makes it a practical foundation for nextโgeneration research in airโground robotics, safer city operations, and embodied AIโand itโs available now for the community to build on.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paper.
- Quantitative validation of spatialโtemporal consistency: no measurement of cross-sensor timestamp skew, pose consistency, or per-tick jitter across aerial and ground sensors under load (e.g., many sensors/agents, high resolutions).
- Physics fidelity of aerial dynamics: no benchmarking of multirotor models (e.g., SimpleFlight) against real flight logs (attitude, trajectory tracking, wind disturbances, ground effect, downwash).
- Weather and wind modeling: absence of experiments validating wind fields, turbulence, gusts, and their effects on UAV dynamics and sensor outputs; unclear how to configure or simulate realistic aerodynamics-weather coupling.
- Sensor realism and calibration: no validation of camera rolling shutter, motion blur, lens distortion, LiDAR scan timing, radar propagation artifacts, IMU noise/bias drift, GNSS multipath; lacks procedures to calibrate and verify sensor models.
- Cross-modal registration accuracy: missing quantitative evaluation of extrinsic alignment between aerial and ground sensor suites (e.g., reprojection error, LiDARโcamera consistency) and tools for automatic cross-view calibration.
- Scaling to multi-drone, high-density scenes: only one-drone experiments reported; no analysis of performance, scheduling, or stability with many drones (each at ~1 kHz physics), large traffic populations, and hundreds of sensors.
- Pedestrian/traffic behavior in joint scenarios: authors note high-density behavior is an โactive engineering targetโ; no evaluation of how pedestrians/vehicles react in the presence of drones or how performance degrades with dense actors.
- Determinism and reproducibility: no evidence that synchronous mode yields bitwise- or numerically-deterministic runs across seeds/hardware; repeatability under multi-threaded aerial physics remains untested.
- Long-duration stability beyond 3 hours: endurance test is limited (3 hours, 357 cycles); unknown behavior over multi-day training runs (memory fragmentation, VRAM drift, resource leaks, RPC stability).
- Upper bounds on sensing throughput: while โup to 18 sensor modalitiesโ is claimed, there is no characterization of maximum sustainable sensor counts per agent/world, high-FPS (e.g., 120 Hz) cameras, or 4K/8K resolutions.
- Dataset construction specifics: missing details on ground-truth formats (3D boxes, instance IDs, dense semantics), synchronization/timestamps, calibration file export, and guarantees for cross-view correspondence correctness at each tick.
- Sim2Real transfer evidence: no case studies or metrics demonstrating that policies/detectors trained in CARLA-Air transfer to real UAV/UGV systems; lacks domain randomization knobs and recommended ranges for reducing reality gap.
- Physical interaction across domains: not evaluated whether drone downwash affects nearby actors/particles/vegetation, nor whether collisions and contact dynamics between aerial and ground agents are realistic and numerically stable.
- Networking beyond loopback: latency/throughput measured only on localhost; no evaluation for remote clients, multi-user sessions, or LAN/WAN scenarios (packet loss, jitter) common in distributed training and teleoperation.
- Headless/cloud execution: no results for offscreen rendering, containerized/cloud deployments, or multi-GPU servers; unclear performance and stability without a display, and under virtualization.
- Cross-platform support: performance and compatibility are only reported for Ubuntu 20.04/UE4; Windows/macOS, diverse GPU drivers, and UE5 migration viability are not assessed.
- Version compatibility matrix: ROS 2 and Python API compatibility is claimed but not enumerated; lacks a tested matrix of CARLA/AirSim/ROS 2/Python versions and guarantees across future upstream updates.
- Concurrency and scheduling: aerial physics at ~1 kHz is described as โon a dedicated thread,โ but no analysis of contention with UE render/physics threads, priority inversion risks, or how shared-tick semantics interact with high-rate sensors.
- Real-time control budgets: no end-to-end latency accounting from sensor capture to control actuation at typical control rates (e.g., 100โ400 Hz) to confirm feasibility for tight PID/MPC/VO pipelines.
- Asset pipeline rigor: procedures to specify mass/inertia, propulsion constants, and controller gains for imported UAVs/UGVs are not detailed; lacks validation tools to catch inconsistent collision meshes or ill-conditioned dynamics.
- Benchmark suite definition: the table claims a test suite, but the paper does not detail standardized scenarios, tasks, and metrics for airโground cooperation, navigation, or perceptionโhindering reproducible comparisons.
- GNSS/georeferencing realism: unknown support for geo-anchored environments, realistic GNSS errors (bias, multipath, urban canyons), and alignment with real-world coordinates/maps for outdoor robotics.
- Sensorโweather coupling: untested whether rain/fog/snow affect LiDAR/radar/camera realistically (attenuation, backscatter, glare) and how parameters map to physically plausible conditions.
- Resource sharing with on-box learning: although VRAM headroom is reported, no experiments co-running policy training (GPU-heavy) with the simulator to quantify interference, throughput, and stability.
- Extending beyond multirotors: no support or roadmap for fixed-wing/VTOL/heli dynamics and sensors; unclear how easily aerial models can be generalized.
- Failure handling and recovery: lack of evaluation of robustness to client crashes, network disconnects, or sensor/actor misconfigurations, and how the simulator recovers without manual intervention.
- Evaluation fairness vs. co-simulation: Figure 1 reports IPC savings, but lacks a controlled, end-to-end comparison of task performance/accuracy (e.g., perception alignment, RL learning curves) between single-process and bridge-based setups.
Practical Applications
Immediate Applications
The following use cases can be deployed now, leveraging CARLA-Airโs single-process, synchronized airโground simulation, preserved CARLA/AirSim Python APIs and ROSโ2 interfaces, multi-modal sensing (up to 18 modalities), and extensible asset pipeline.
- Sector: Robotics/Autonomy (industry, academia)
- Application: Rapid prototyping and evaluation of airโground cooperative algorithms (e.g., cooperative surveillance, escort, search-and-rescue)
- Workflow/Tools: Reuse existing CARLA or AirSim codebases without modification; run closed-loop tests in synchronous mode; log strictly time-aligned aerial/ground sensor streams; parameter-sweep PIDs and planners; iterate via ROSโ2
- Assumptions/Dependencies: Mid-range GPU/CPU to sustain ~20 FPS in joint workloads; sim-to-real gap requires domain randomization and field validation; pedestrian behavior under very high densities may need tuning
- Sector: Software/AI (perception, mapping)
- Application: Synthetic multi-modal, cross-view dataset generation (paired aerialโground RGB/depth/segmentation/LiDAR/radar/IMU/GNSS) for cross-view matching, 3D reconstruction, scene understanding
- Workflow/Tools: Configure synchronized sensor rigs on drones and vehicles; export auto-labeled data; vary weather/time-of-day; import custom maps/assets for target domains; create a โdataset factoryโ pipeline
- Assumptions/Dependencies: Asset licensing and realism of imported environments; manage domain gap (use style and photometric augmentations); storage/compression pipeline for large multi-sensor streams
- Sector: AI/ML (reinforcement learning, multi-agent)
- Application: Cooperative RL training for aerialโground policies in a single, shared-tick environment
- Workflow/Tools: Wrap CARLA-Air in Gym-like interfaces; use domain randomization for robustness; co-train policies for pursuit-evasion, target handoff, shared exploration; integrate with distributed RL frameworks
- Assumptions/Dependencies: Compute budget to maintain stable tick rates; careful seed control for reproducibility; curriculum design to mitigate exploration burden
- Sector: VisionโLanguageโAction (academia, applied AI)
- Application: Vision-language navigation and action with complementary aerial overview + ground detail
- Workflow/Tools: Record synchronized videoโstateโlanguage tuples; evaluate VLM/VLA agents on language-conditioned navigation, search, and manipulation proxies; build benchmarks with consistent ground truths
- Assumptions/Dependencies: External LLM/VLM stack integration; annotation pipelines for high-level instructions; evaluate generalization beyond synthetic visuals
- Sector: Logistics/UAM (industry)
- Application: Drone delivery and landing-site prototyping amid realistic urban traffic and pedestrians
- Workflow/Tools: Model delivery routes, curbside interactions, and emergency landing strategies; stress-test perception and landing policies under varied weather and density; iterate with ROSโ2 and Python APIs
- Assumptions/Dependencies: No built-in airspace/UTM moduleโoperational rules and geofencing must be scripted; wind/turbulence beyond defaults may require plugins; results need real-world correlation
- Sector: Software Engineering (DevOps/QA for autonomy stacks)
- Application: CI/CD regression testing of autonomy software (ROSโ2 nodes, planning, perception)
- Workflow/Tools: Synchronous mode for determinism; headless runs in fixed scenarios; replayable seeds; assert on KPIs (collisions, waypoint latency, success rate); prebuilt binaries for standardized test runners
- Assumptions/Dependencies: Test harness integration; machine reproducibility (driver/OS versions); coverage of edge cases still curated by engineers
- Sector: Education/Training (universities, bootcamps)
- Application: Dual-domain robotics labs (flight + driving) in one environment
- Workflow/Tools: Course modules on sensor fusion, localization, multirotor control, and multi-agent planning; students reuse CARLA/AirSim examples without code changes; evaluate policies with synchronized sensors
- Assumptions/Dependencies: Lab machines with discrete GPUs; onboarding to Unreal-based simulator; instructor-provided scenarios and rubrics
- Sector: Geomatics/SLAM (academia, startups)
- Application: Benchmarking cross-view localization, SLAM, and 3D mapping with guaranteed pose ground truth
- Workflow/Tools: Collect paired aerial/ground sequences with LiDAR + vision; export pose graphs and map priors; test cross-view place recognition and aerial-assisted localization
- Assumptions/Dependencies: Synthetic geometry/materials vs. target domain; careful sensor noise modeling to match real hardware
- Sector: Energy/Infrastructure Inspection (industry)
- Application: Simulated inspection missions using drone + UGV collaboration (e.g., substation, bridge)
- Workflow/Tools: Import CAD/meshed assets; set waypoint plans; evaluate viewpoint planning and defect detection models with controllable lighting and occlusion
- Assumptions/Dependencies: Quality of imported asset geometry and materials; specialized sensors (e.g., thermal) may need emulation extensions
- Sector: HCI/Teleoperation (industry, UX research)
- Application: Operator-in-the-loop evaluations of multi-vehicle supervision (switching aerial/ground views)
- Workflow/Tools: Integrate custom UIs; record operator performance; evaluate camera placement, alerting, and autonomy handoff policies
- Assumptions/Dependencies: External UI stack; latency budget shaped by rendering + image transfer settings; human-subject protocols for studies
- Sector: Public Safety (municipal agencies, vendors)
- Application: Procedural rehearsal for joint droneโground emergency response (routing, deconfliction, crowd-aware navigation)
- Workflow/Tools: Script incidents, crowds, and traffic; measure response time and safety metrics; iterate tactics before field drills
- Assumptions/Dependencies: Ethical constraints; policy approximations for crowd behavior; scenario variability vs. doctrine alignment
- Sector: Platform Migration (all sectors with AirSim/CARLA legacy)
- Application: Zero-modification migration of existing AirSim or CARLA projects into a unified airโground setup
- Workflow/Tools: Keep native Python/ROSโ2 APIs; validate synchronization and sensor timing; extend scenes incrementally with asset pipeline
- Assumptions/Dependencies: Pin toolchain/driver versions; verify coordinate-frame conversions for mixed stacks
Long-Term Applications
These use cases are enabled by CARLA-Airโs architecture but require additional research, scaling, validation, or ecosystem integrations.
- Sector: Urban Air Mobility / UTM (industry, regulators, policy)
- Application: City-scale digital twins for low-altitude + ground robotics, with UTM integration and policy sandboxing
- Potential Tools/Products: โUrban Low-Altitude Ops Studioโ combining CARLA-Air with traffic simulators and UTM services; batch what-if studies for corridor design, geofencing, and contingency handling
- Assumptions/Dependencies: Integration with UTM APIs and airspace rules engines; scalable multi-machine orchestration; validated behavior models for large crowds/traffic; regulator-accepted fidelity studies
- Sector: Certification/Safety (policy, industry)
- Application: Certification-grade virtual validation of cooperative airโground systems
- Potential Tools/Products: Scenario coverage libraries, hazard injection toolkits, formal requirement checkers, and traceability dashboards
- Assumptions/Dependencies: Statistically significant correlation with on-road/flight data; scenario taxonomies (e.g., PEGASUS-like) covering airโground edge cases; V&V standards adoption by regulators
- Sector: Logistics/Retail (industry)
- Application: End-to-end orchestration of last-mile airโground delivery (drones + sidewalk robots) with dynamic dispatch and curb management
- Potential Tools/Products: โAirโGround Delivery Orchestratorโ for fleet simulation, scheduling, and micro-fulfillment layout optimization
- Assumptions/Dependencies: High-fidelity curb and pedestrian models; integration with mapping, inventory, and customer systems; real-world pilots for calibration
- Sector: AI/Foundational Models (academia, tech)
- Application: Pretraining datasets for cross-view, multi-modal foundation models (aerialโground video, depth, semantics, language)
- Potential Tools/Products: Pipelines that generate large-scale, time-synchronized corpora with automatic annotations and scripted language descriptions of events
- Assumptions/Dependencies: Photorealism and physics fidelity sufficient for transfer; scalable rendering farms; principled sim-to-real adaptation strategies
- Sector: Emergency Management (public sector)
- Application: Live-dataโassisted digital twins for disaster response (SAR, wildfire, flood) coordinating air and ground assets
- Potential Tools/Products: โSAR Ops Simulatorโ that ingests real-time GIS/weather feeds and predicts asset allocation strategies
- Assumptions/Dependencies: Environmental dynamics (smoke, wind, water) modeling; data assimilation pipelines; cross-agency interoperability
- Sector: Telecom (industry)
- Application: Joint evaluation of 5G/6G connectivity for aerialโground fleets (coverage, handoff, QoS under mobility)
- Potential Tools/Products: Coupled CARLA-Air + network simulators (e.g., ns-3) to study network-aware planning and adaptive bitrate for perception streams
- Assumptions/Dependencies: Tight co-simulation with network stack; timing fidelity at scale; realistic RF propagation models in urban canyons
- Sector: Healthcare (hospitals, logistics)
- Application: Integrated hospital-campus autonomy (specimen/drug delivery by drones with AGV handoff)
- Potential Tools/Products: Workflow simulators for routing, scheduling, and sterile-chain compliance; capacity planning under peak demand
- Assumptions/Dependencies: Hospital IT integration; regulatory approvals; modeling of indoorโoutdoor transitions and secure landing/pickup zones
- Sector: Insurance/Finance (risk, underwriting)
- Application: Simulation-driven risk scoring for airโground autonomous operations
- Potential Tools/Products: Scenario portfolios estimating incident probabilities under different policies, environments, and fleet mixes
- Assumptions/Dependencies: Calibrated incident models; access to claims/incident data for validation; acceptance by underwriting stakeholders
- Sector: Sustainability/Energy (utilities, EPC)
- Application: Planning and optimizing inspection and maintenance of distributed assets (power lines, PV farms, pipelines) with joint airโground teams
- Potential Tools/Products: Route planners that minimize time and carbon footprint while maintaining coverage and risk constraints
- Assumptions/Dependencies: Detailed asset libraries and terrain; integration with enterprise asset management; emissions models
- Sector: Training/Certification (consumer, enterprise)
- Application: Operator training suites for supervising heterogeneous fleets in dense urban settings
- Potential Tools/Products: Scenario-based training with performance analytics and certifications; progressive difficulty and failure injection
- Assumptions/Dependencies: Training content development; human factors validation; hardware-in-the-loop or device integration for realism
- Sector: Benchmark Ecosystem (academia, community)
- Application: Standardized, community-maintained benchmarks for airโground embodied intelligence
- Potential Tools/Products: Public leaderboards for cooperative navigation, perception, and VLA tasks with synchronized metrics and datasets
- Assumptions/Dependencies: Governance and maintenance; agreed task definitions and metrics; compute resource sponsorship
Cross-Cutting Assumptions and Dependencies
- Hardware/OS: Linux workstations with discrete GPUs (e.g., RTX-series) for smooth joint workloads; driver and Unreal Engine compatibility pinned.
- Fidelity and Transfer: Synthetic-to-real transfer remains a limiting factor; employ domain randomization, sensor-noise modeling, and targeted real-world validation.
- Scaling: Current single-process design eliminates IPC overhead but may require architectural extensions for city-scale, multi-machine simulations.
- Ecosystem Integrations: UTM, RF/network, aeroacoustics, and advanced weather/hazard dynamics require coupling with specialized simulators.
- Maintenance: CARLA-Air actively extends AirSim within a modern infrastructure; sustained community and maintainer support will underpin long-term viability.
Glossary
- Air-ground cooperation: Coordinated operation between aerial and ground robots to achieve joint tasks. "Air-ground cooperation---heterogeneous aerial and ground agents coordinate within a shared environment for tasks such as cooperative surveillance, escort, and search-and-rescue."
- air-ground embodied intelligence: Research area focused on agents that perceive and act jointly across air and ground domains in a shared physical world. "CARLA-Air, a unified simulation infrastructure for air-ground embodied intelligence."
- asset pipeline: The set of tools and processes to import and integrate custom robots, vehicles, and maps into the simulator. "An extensible asset pipeline further allows researchers to integrate custom robot platforms, UAV configurations, and environment maps into the shared simulation world."
- autopilot: Automated control mode for vehicles that drives them without manual input. "8 autopilot vehicles + 1 drone; 1 aerial RGB @ "
- barometry: Measurement of atmospheric pressure for altitude estimation in robotics. "RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry"
- BeginPlay: A Unreal Engine lifecycle event triggered when gameplay starts, commonly used to initialize actors. "composed in BeginPlay"
- bridge-based co-simulation: An approach that links separate simulators via communication bridges to run together. "Bridge-based co-simulation can connect heterogeneous backends, yet introduces synchronization overhead"
- closed-loop interaction: A control setting where agents act, receive feedback, and update actions continuously within the environment. "agents learn cooperative or individual policies through closed-loop interaction in physically consistent air-ground environments."
- cross-process serialization: Packaging and transferring data across process boundaries, often adding latency. "Bridge-based co-simulation~\cite{transimhub} exhibits near-linear growth with sensor count due to cross-process serialization"
- decoupled execution: Running subsystems without a shared synchronization tick, allowing independent timing. "Sync Mode: Msg.\,=\,message passing, Decpl.\,=\,decoupled execution, Shared\,=\,shared-tick within one process."
- flight pawn: An Unreal Engine controllable entity representing the aircraft for physics and control. "flight pawn"
- GNSS: Global Navigation Satellite System, providing global positioning and timing (e.g., GPS, GLONASS). "RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry"
- GPU-accelerated reinforcement learning: RL training that exploits GPU parallelism for faster simulation or learning. "Isaac Lab~\cite{isaaclab} and Isaac Gym~\cite{isaacgym} emphasize massively parallel GPU-accelerated reinforcement learning"
- GPU memory bandwidth saturation: Performance limit reached when GPU memory transfer capacity is the bottleneck. "Sensor rendering dominates at high resolution due to GPU memory bandwidth saturation"
- harmonic mean: A mean for rate quantities that reduces the impact of large values. "All reported frame rates are the harmonic mean of "
- handedness (of coordinate frames): The orientation convention (left- or right-handed) used to define axes in 3D space. "negating accounts for the Z-axis reversal and the associated change of frame handedness."
- IMU: Inertial Measurement Unit, providing acceleration and angular velocity for state estimation. "RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry"
- interquartile range (IQR): A robust measure of statistical dispersion between the 25th and 75th percentiles. "Round-trip API call latency on the loopback interface (median IQR; 5\,000 calls after 500 warm-up; RTX~A4000; idle scene)."
- left-handed system: A coordinate convention where axes follow left-hand orientation (as used by UE4). "CARLA inherits UE4's left-handed system with X forward, Y right, and Z up, in centimeters."
- LiDAR: Light Detection and Ranging sensor that measures distances using laser pulses. "RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry"
- loopback interface: A network interface that routes traffic back to the same machine (e.g., 127.0.0.1). "Both simulation APIs operate within the same process on the loopback interface, eliminating inter-process serialization overhead."
- message-passing middleware: Software layer for exchanging messages between processes or systems. "message-passing middleware across independent processes."
- multirotor: A UAV with multiple rotors for lift and control (e.g., quadrotor). "physics-accurate multirotor flight"
- North-East-Down (NED) frame: A right-handed geographic coordinate frame with axes pointing North, East, and downward. "AirSim adopts a right-handed North-East-Down (NED) frame with X north, Y east, and Z down, in meters."
- photogrammetry: Technique to reconstruct scenes from images, often used to build realistic environments. "FlightGoggles~\cite{flightgoggles} provides photogrammetry-based environments"
- photorealistic: Rendering that closely resembles real-world appearance. "CARLA~\cite{carla}, built on Unreal Engine~4~\cite{ue4}, has become the de facto standard for urban autonomous driving research, offering photorealistic environments"
- physics tick: The discrete timestep at which physics simulation updates occur. "Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic urban and natural environments"
- PID gains: ProportionalโIntegralโDerivative controller parameters tuning system response. "All aerial experiments use the built-in SimpleFlight controller with default PID gains."
- pose transform: The mathematical mapping between positions and orientations across coordinate frames. "Eqs.~(\ref{eq:pos_transform}) and~(\ref{eq:rot_transform}) together fully specify the pose transform"
- unit quaternion: A normalized quaternion representing 3D orientation without singularities. "Let denote a unit quaternion in the UE4 frame."
- render-target caching: Reuse of GPU render buffers to avoid reallocation costs across frames. "the negligible early-to-late drift ... is attributable to residual render-target caching rather than lifecycle leakage."
- rendering pipeline: The sequence of GPU stages that produce images from 3D scene data. "Shared UE4 Rendering Pipeline"
- reinforcement-learning-based policy training: Learning control policies via reward signals through interaction with the environment. "Reinforcement-learning-based policy training---agents learn cooperative or individual policies through closed-loop interaction in physically consistent air-ground environments."
- right-handed frame: A coordinate convention where axes follow right-hand orientation. "AirSim adopts a right-handed North-East-Down (NED) frame with X north, Y east, and Z down, in meters."
- ROS 2: Robot Operating System 2, a middleware framework for robotic communication and control. "ROS\,2 interfaces"
- RPC server: Remote Procedure Call server that handles requests from clients to invoke functions across processes or over a network. "Two independent RPC servers run concurrently within the single process---one per simulator---allowing the native Python clients of each platform to connect without modification."
- semantic segmentation: Vision task assigning a class label to each pixel in an image. "RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry"
- sensor modalities: Distinct types of sensor data streams (e.g., RGB, depth, LiDAR). "up to 18 sensor modalities---including RGB, depth, semantic segmentation, LiDAR, radar, IMU, GNSS, and barometry---across all aerial and ground platforms at each simulation tick."
- shared-tick: A synchronization mode where systems advance using the same discrete time step. "Shared\,=\,shared-tick within one process."
- single-process architecture: Design where all components run within one OS process, avoiding IPC overheads. "CARLA-Air remains effectively constant (\,ms) owing to its single-process architecture."
- spatial-temporal consistency: Strict alignment in space and time across sensors and subsystems. "cannot guarantee the strict spatial-temporal consistency required by modern perception and learning pipelines."
- synchronous mode: Simulation mode where time advances in discrete, externally triggered steps for determinism. "Under synchronous-mode operation, per-tick wall time is bounded by the slowest of three concurrent contributors"
- synchronization overhead: Extra time or resources required to keep different systems aligned in time. "Bridge-based co-simulation can connect heterogeneous backends, yet introduces synchronization overhead"
- UAV: Unmanned Aerial Vehicle, commonly referred to as a drone. "aerodynamically consistent UAV dynamics"
- UE4: Unreal Engine 4, a game engine used as the simulation backend. "UE4 enforces a strict invariant: each world may have exactly one active game mode."
- UE4 Game Mode Slot: The single per-world slot in UE4 that hosts the active game mode class. "UE4 Game Mode Slot"
- vision-language action: Control of agents using combined visual and natural language inputs to specify tasks. "Embodied navigation and vision-language action---agents navigate and act grounded in visual and linguistic input"
- VRAM: Video RAM on the GPU used for textures, buffers, and render targets. "VRAM is sampled every 60\,s."
- wire protocols: Defined formats and rules for encoding data transmitted between systems. "The two RPC servers use distinct wire protocols and port assignments."
Collections
Sign up for free to add this paper to one or more collections.