CARLA-Based Dataset for Autonomous Research

Updated 28 April 2026

CARLA-based dataset is a multimodal, temporally aligned sensor data collection developed in the CARLA simulator, designed for autonomous systems research.
Its pipeline uses advanced synchronization techniques and modular frameworks like Car-STAGE to efficiently generate diverse, high-quality sensor data.
The dataset offers standardized formats and detailed annotations for benchmarking tasks such as detection, segmentation, planning, and adversarial robustness.

A CARLA-based dataset is a multimodal, temporally aligned collection of sensor data generated within the CARLA open-source automotive simulator. These datasets are engineered for research in autonomous driving, robotics, computer vision, adversarial robustness, simulation-to-real (sim2real) transfer, and related disciplines. Typical data modalities encompass RGB and depth imagery, LiDAR and Radar point clouds, GNSS/IMU logs, ground-truth object labels (semantic, instance, or detection), and emerging sensor types such as event-based vision streams. The scale, structure, annotation protocols, and extensibility of CARLA-based datasets are determined by the synthesis pipeline, simulation parameters, and research objectives of the dataset developers.

1. Architectural Principles and Data Generation Pipelines

CARLA-based datasets are generated by interfacing with the CARLA Python API, which exposes fine-grained control over scene construction, sensor rig configuration, environment parameters (weather, lighting), agent spawning, and ground-truth data extraction. Recent frameworks automate dataset creation via GUI-driven tools, configuration file templates, or scenario scripting.

For example, the Car-STAGE framework centralizes configuration through a graphical interface, translating user-defined criteria (selected maps $M$ , sensor set $S$ , environmental conditions $E$ , lighting $L$ , actor counts, frame rate, episode length) into automated, synchronous data capture (Almutairi et al., 5 Mar 2025). The pipeline is modular and multithreaded: a main thread orchestrates the CARLA server and simulation ticks, a queue worker thread tags and batches incoming frames, and multiple executor threads process and write each sensor's raw data to pre-allocated memory-mapped files. This design ensures frame-level sensor synchronization and efficient pipeline utilization.

An example workflow proceeds as:

User configures scenario via GUI;
Car-STAGE launches CARLA, spawns traffic and sensors, and enters a synchronous acquisition loop at the specified FPS;
Each sensor's thread writes raw binary to a direct-access mmap region;
After all episodes, worker pools convert binary blobs to canonical machine learning formats (PNG, PCD, CSV, etc.).

Example pseudocode for the launch and acquisition steps is included in [(Almutairi et al., 5 Mar 2025), Section 1.c].

Other frameworks instantiate complex scenario trees (long-tail maneuvers (Gorgulu et al., 26 Feb 2026)), closed-loop perception/planning feedback (Qiao et al., 12 Nov 2025), or adversarial manipulations (e.g., physically injected patch attacks on billboards (Nesti et al., 2022)).

2. Sensor Suites, Environmental Diversity, and Data Modalities

CARLA-based datasets leverage the simulator's rich sensor abstraction—including, but not limited to, the following modalities:

Vision: Pinhole RGB, depth, semantic/instance segmentation, optical flow, event-based DVS;
Range: LiDAR (RayCast, semantic, coherent, Doppler-enabled), MIMO FMCW radar (Range-Doppler, Range-Azimuth-Elevation cubes);
Localization: GNSS, IMU (accelerometer, gyroscope);
Hybrid: Bird's Eye View projections (e.g., nuCarla BEV), panoptic segmentation.

Sensor parameters (intrinsics/extrinsics, FOV, baseline, mounting geometry) are precisely logged per session—for example, nuCarla matches the nuScenes suite (six 1600×900 RGB cameras at canonical locations, calibrated intrinsics/extrinsics, ego pose logs) (Qiao et al., 12 Nov 2025), TaCarla mirrors this with LiDAR and radar integration (Gorgulu et al., 26 Feb 2026), and SCaRL provides six rigidly attached suites of RGB, semantic, depth, LiDAR, and radar, perfectly synchronized in ego-vehicle and world frames (Ramesh et al., 2024).

Environmental diversity is engineered via randomization or structured factorial design:

Weather: presets (Clear, Cloudy, Rain, Fog, Snow) and controlled interpolation (domain-shift sweeps in SEVD (Aliminati et al., 2024)).
Lighting/Daytime: random seeds or sliders spanning noon, sunset, dusk, night.
Traffic/Actors: deterministic or random actor spawning, diverse vehicles, nuanced behaviors (car-following, gap-acceptance (Zhou et al., 17 Jan 2026)).
Scenario Complexity: benchmarked challenge scenarios, e.g., long-tail events (emergency yielding, construction, cut-ins (Gorgulu et al., 26 Feb 2026)), roundabout merging/yielding (factorial LOS x weather in CARLA-Round (Zhou et al., 17 Jan 2026)), closed-loop feedback simulations.

3. Data Representation, Formats, and Annotation Schemas

Standardized outputs and annotation conventions enable reproducibility and cross-benchmarks. Sensor data is typically exported in:

Images: PNG (8/16-bit for vision/range), JPEG (for downstream compressed storage), resolvable up to 2160×1440 (e.g., SkyScenes aerial scenes (Khose et al., 2023)).
Point Clouds: PCD (ASCII or binary), .bin (float32 per point: $[x, y, z, \mathrm{intensity}]$ ). LiDAR and radar outputs sometimes include per-point semantic, instance, velocity information (Ramesh et al., 2024).
Tabular: CSV logs (GNSS, IMU, annotation JSONs, trajectory files).
Specialized: numpy .npz arrays for event streams or raw radar cubes (used in SEVD and SCaRL).
Directory Structure: Hierarchical, by modality/sensor/date/scene, with canonical naming (camera_front/00000.png, lidar/00000.bin, etc.) (Almutairi et al., 5 Mar 2025).

Annotation schemas are explicit and scenario-specific:

Object Detection: 2D/3D bounding boxes, projected via sensor transforms, with per-actor class, id, orientation, occlusion/visibility tags. Various standard formats are supported—COCO JSON (2D), KITTI, nuscenes-devkit-compatible JSON (3D, BEV). Occlusion filtering addresses the "ghost box" problem by cross-referencing segmentation pixels within detection boxes (Chaar et al., 20 Sep 2025).
Segmentation: Dense per-pixel labels, instance ID masks, panoptic codes; 23–36 class taxonomies harmonized with Cityscapes/KITTI (Testolina et al., 2022, Deschaud et al., 2021).
Trajectories: Agent-centric state logs (position, velocity, heading, behavioral flags: yielding, merging, occluded), sampled at fixed rates (e.g., 10 Hz in CARLA-Round (Zhou et al., 17 Jan 2026)).
Event/Optical Flow: Event tuples $(x, y, p, t)$ , timestamped with microsecond precision, temporally aligned with ground truth grayscale/flow (Mansour et al., 2024).

Formats and schemas are usually published alongside code for conversion and data loading (e.g., MMDetection3D adapter for nuCarla (Qiao et al., 12 Nov 2025)).

4. Performance Benchmarks, Baselines, and Analysis

CARLA-based datasets routinely publish baseline results using established architectures for each supported perception or planning task. Evaluation protocols are congruent with established standards from the relevant vision/robotics subfields.

Detection: mAP (mean Average Precision) at varying IoU thresholds (0.5 for COCO/KITTI, 0.7 for stricter BEV), NDS (nuScenes detection score), per-class, per-scenario breakdowns (Qiao et al., 12 Nov 2025, Gorgulu et al., 26 Feb 2026, Ramesh et al., 2024).
Segmentation: mIoU across all classes, per-class IoU, Panoptic Quality (PQ), class-weighted or instance-weighted accuracy (Khose et al., 2023, Testolina et al., 2022, Deschaud et al., 2021).
Planning/Trajectory Prediction: Average Displacement Error (ADE), Final Displacement Error (FDE), Action/Heading Error (AHE/FHE) for multi-step prediction, zero-shot transfer to real-world datasets (Zhou et al., 17 Jan 2026, Gorgulu et al., 26 Feb 2026).
Closed-Loop Control: Driving Score, Route Completion, Penalty metrics (CARLA Leaderboard framework in TaCarla (Gorgulu et al., 26 Feb 2026)).
Adversarial Robustness: AUROC for attack/defense discrimination, accuracy drop from clean to patched scenes, recoverability by defense methods (e.g., Z-Mask, LGS) (Nesti et al., 2022).
Sim-to-Real Transfer: Segmentation/odometry error deltas before and after fine-tuning, mIoU and ATE for synthetic vs. real test data (Deschaud et al., 2021, Deschaud, 2021).

Results yield quantitative insight into factor impacts (e.g., traffic density's monotonic effect on prediction error (Zhou et al., 17 Jan 2026)), model generalization, and the efficacy of transfer or domain adaptation methods.

5. Dataset Scale, Organization, Reproducibility, and Access

CARLA-based datasets vary widely in scale, with typical datasets comprising tens of thousands to millions of frames or time steps:

Large-scale: TaCarla (2.85 million frames, 79 h drive time, full scenario stratification) (Gorgulu et al., 26 Feb 2026); SELMA (20M sensor samples, 30,909 unique waypoints over 216 scene/view combinations) (Testolina et al., 2022); SCaRL (140,000 frames × 6 sensor suites) (Ramesh et al., 2024).
Moderate-scale: nuCarla (40,000 samples, 459,632 objects, fully compatible with nuScenes devkit) (Qiao et al., 12 Nov 2025); Car-STAGE (e.g., 10 runs × 60 s × 30 FPS = 18,000 frames) (Almutairi et al., 5 Mar 2025).
Specialized: Adver-City (24,000 frames, 890,000+ 3D bounding box annotations under adverse conditions) (Karvat et al., 2024); SKoPe3D (25,000 images, 4.9M 3D keypoint annotations (Pahadia et al., 2023); Paris–CARLA-3D (700M points synthetic, 60M points real) (Deschaud et al., 2021).

Datasets are commonly released with full scenario scripts, YAML configs, code for both reproduction and data loading, and detailed system requirements (e.g., CARLA version, OS, hardware). Many pipelines publish open-source repositories for the full generation/processing/benchmarking cycle (Almutairi et al., 5 Mar 2025, Qiao et al., 12 Nov 2025, Nesti et al., 2022, Ramesh et al., 2024, Mansour et al., 2024). Reproducibility steps include random seed logging, per-frame sensor/resource usage metadata, and mapping baked-in scenario randomness to configuration state.

6. Advances in Synchronized Multimodal, Adversarial, and Photorealistic Data

CARLA-based datasets continue to drive advances in three principal areas:

Multimodal Synchronization: High-frequency, precisely synchronized acquisition of RGB, depth, semantic, LiDAR, radar (including Range-Doppler-Azimuth-Elevation representations) and event-based streams, often under full or partial sensor-fusion calibration (Ramesh et al., 2024, Aliminati et al., 2024). Emerging datasets incorporate coherent LiDAR and MIMO radar, enabling research into sensor-fusion models and non-vision detection.
Adversarial Example Benchmarking: Physical patch-injection frameworks (e.g., CARLA-GeAR (Nesti et al., 2022), adversarial mesh streaming (Liu et al., 2022)) provide standardized evaluation environments for adversarial attacks and defense strategies, integrating mesh-aware, differentiable patch optimization and in situ environmental transforms (lighting, occlusion, motion).
Sim2Real Alignment: Photorealistic enhancement tools such as CARLA2Real utilize GAN-based, G-buffer-conditioned style transfer to generate pairs of synthetic and "enhanced" images, reducing domain gaps at both the appearance and feature-distribution levels. These approaches are validated by mIoU improvements in segmentation tasks and feature cosine-similarity metrics versus real datasets (Cityscapes, KITTI) (Pasios et al., 2024).

7. Canonical Use Cases and Extensions

CARLA-based datasets serve a broad set of research and development purposes:

Autonomous driving perception and planning: End-to-end closed-loop learning (nuCarla, TaCarla), scenario-driven prediction (CARLA-Round), rare-event handling.
Sim2Real Transfer and Curriculum Learning: Pre-train on synthetic data, fine-tune on limited real data, and evaluate generalization on held-out real-world benchmarks (KITTI, rounD, UAVid, etc.).
Robustness to Adverse or Rare Events: Evaluate models under systematically varied and adversarial conditions (weather, patch attacks, density).
Benchmarks for Multi-modal Sensor Fusion: Develop and compare sensor-fusion (camera–LiDAR–radar) architectures for detection, segmentation, and tracking.
New Sensing Modalities: Event camera datasets for optical flow and traffic analysis (Aliminati et al., 2024, Mansour et al., 2024).

A plausible implication is that the proliferation of scalable, richly annotated, open CARLA-based datasets, combined with robust pipelines for factorized scenario control, multithreaded acquisition, and postprocessing, will continue to provide indispensable resources for the quantitative development, benchmarking, and real-world translation of autonomous systems research (Almutairi et al., 5 Mar 2025, Qiao et al., 12 Nov 2025, Nesti et al., 2022, Aliminati et al., 2024, Mansour et al., 2024, Gorgulu et al., 26 Feb 2026, Zhou et al., 17 Jan 2026, Deschaud et al., 2021, Karvat et al., 2024, Ramesh et al., 2024).