RoundaboutHD: Urban Multi-Camera Tracking Dataset

Updated 24 November 2025

RoundaboutHD is a high-resolution urban dataset designed for multi-camera vehicle tracking, providing synchronized 4K video and extensive manual annotations.
It captures nonlinear vehicle trajectories in challenging roundabout environments with frequent occlusions and diverse viewpoints, supporting intelligent transportation research.
The dataset underpins tasks like object detection, single-camera tracking, image re-identification, and multi-camera tracking with detailed performance metrics.

RoundaboutHD is a high-resolution, real-world urban benchmark for multi-camera vehicle tracking (MCVT) that targets complex, dynamic scenarios not adequately captured by earlier academic datasets. Designed for rigorous evaluation of detection, tracking, and re-identification (ReID) under challenging roundabout conditions, RoundaboutHD provides synchronized video, spatially distributed camera placement, extensive annotations, and open access to metadata critical for research in intelligent transportation systems, smart-city analytics, and computer vision (Lin et al., 11 Jul 2025).

1. Dataset Architecture and Collection Setup

RoundaboutHD encompasses footage from four static, non-overlapping cameras positioned around a real-world urban roundabout. Each camera records at 4K resolution (3840 × 2160) with a frame rate of 15 fps, yielding a cumulative total of 40 minutes and 36,000 frames per camera (144,000 frames total). The spatial arrangement maximizes coverage and trajectory variety; maximum inter-camera baseline reaches approximately 0.92 km, with intervening retail parks and side streets further increasing route diversity (Lin et al., 11 Jul 2025).

The roundabout’s geometry induces highly nonlinear trajectories such as curves, loops, and U-turns. A large central statue introduces frequent, prolonged occlusions, and vehicle interactions are further complicated by strong daylight shadow variations, rare vehicle types, and occasional heavy goods vehicles. Each camera’s view captures multiple entry and exit arms, imposing complex cross-camera handover requirements and frequent appearance/disappearance events (Lin et al., 11 Jul 2025, Lin et al., 17 Nov 2025).

2. Annotation Protocol and Data Structure

Annotations are fully manual, with rigorous verification for bounding box placement across all cameras. The dataset includes:

Bounding boxes (pixel coordinates) for all vehicles, with per-frame (frame_id, track_id, x, y, w, h).
Timestamps (frame number and real time).
Unique intra- and inter-camera IDs for exact vehicle identity tracking.
Geo-coordinates (latitude/longitude) computed with the “CameraTransform” model using image-plane centers and assuming a constant vehicle height of 0.5 m.

Each vehicle is further labeled with color, vehicle type (Car, SUV, MPV, HGV), make, and model (example: Ford Transit, Honda CR-V). Camera calibration files provide intrinsic and extrinsic parameters, also following the CameraTransform specification, enabling precise mapping from pixel coordinates to GPS positions (Lin et al., 11 Jul 2025, Lin et al., 17 Nov 2025).

Cross-camera identity association is standardized via a multicam mapping file that links track_ids from single-camera annotations to global vehicle_id entries when a vehicle reappears in multiple cameras. Total unique vehicle identities annotated: 512. The dataset contains 549,909 manually verified bounding boxes, with 1,082 single-camera trajectories distributed as: cam01: 262; cam02: 244; cam03: 279; cam04: 257 (Lin et al., 11 Jul 2025).

3. Provided Subsets, Tasks, and Label Organization

RoundaboutHD supports four principal computer vision research tasks, each with dedicated subsets:

Object Detection: All 549,909 bounding boxes, annotated with both generic “vehicle” and subtype information.
Single-Camera Tracking (SCT): 1,082 annotated trajectories, with per-track identity consistency; 233,189 bounding boxes.
Image-based Vehicle Re-Identification (ReID): 65,528 cropped vehicle images (each >3,500 pixels in size) organized into train set (41,128 images, 510 IDs), gallery (23,227 images), and query set (1,173 images from 310 IDs).
Multi-Camera Tracking: The core, fully synchronized 40-minute dataset with explicit cross-camera ID links for direct benchmarking of end-to-end MTMCT systems.

Annotations are distributed in plain-text CSV files, with detection, tracking, ReID, and multicam mapping directories. The repository structure encompasses /videos, /annotations/detection, /annotations/sct, /annotations/reid, /annotations/multicam, and /calibration folders, supported by a README and evaluation scripts. Licensing is MIT-style, with open access and download via Git (Lin et al., 11 Jul 2025).

4. Benchmarking Protocols and Baseline Results

Object Detection

Baseline results use YOLOv12x and YOLOv11x, evaluated at confidence threshold 0.3, NMS IoU 0.4, and input resolution 1280. Results are reported for each camera and averaged in the following table:

Camera	YOLOv12x mAP (%)	YOLOv11x mAP (%)
cam01	39.2	42.1
cam02	67.6	70.4
cam03	73.5	74.4
cam04	77.5	76.6
Mean	64.5	65.9

Single-Camera Tracking

Evaluated with tracking-by-detection algorithms (ByteTrack, BotSort, DeepOCSort, OCSort, BoostTrack), using MOTA and IDF1 metrics:

Tracker	IDF1	MOTA
ByteTrack	83.3	82.6
BotSort	82.7	82.6
DeepOCSort	80.4	80.4
OCSort	76.3	72.6
BoostTrack	59.9	42.8

MOTA is defined per Bernardin & Stiefelhagen (2008): $\mathrm{MOTA} = 1 - \frac{\mathrm{FP} + \mathrm{FN} + \mathrm{IDSW}}{\mathrm{GT}}$

Image-based Vehicle Re-Identification

Using FastReID, results demonstrate substantial performance improvement via fine-tuning:

Model	mAP	Rank-1	Rank-5	mINP
SBS (no PT)	19.51	44.80	58.73	3.56
SBS (fine-tuned)	99.19	99.66	99.66	98.57

Multi-Camera Tracking

ELECTRICITY results (post-filtering static objects; threshold distance = 12, removal = 80):

Dataset	IDF1	IDP	IDR
CityFlow	46.16	–	–
Synthehicle	41.50	48.0	37.5
RoundaboutHD	28.14	26.45	30.06

A lower IDF1 on RoundaboutHD highlights the pronounced difficulty from persistent occlusions, nonlinear motion, and substantial cross-viewpoint variation relative to standard MCVT benchmarks (Lin et al., 11 Jul 2025).

The SAE-MCVT framework, evaluated on RoundaboutHD, achieves a reported IDF1 of 61.96, IDP of 91.02, and IDR of 46.96 on the full test set, using Ristani et al. (2016) metrics: $\text{IDF1} = \frac{2 \cdot \text{IDTP}}{2 \cdot \text{IDTP} + \text{IDFP} + \text{IDFN}};\quad \text{IDP} = \frac{\text{IDTP}}{\text{IDTP} + \text{IDFP}};\quad \text{IDR} = \frac{\text{IDTP}}{\text{IDTP} + \text{IDFN}}$ No MOTA, MOTP, or HOTA metrics are published for this benchmark in the referenced work (Lin et al., 17 Nov 2025).

5. Representative Challenges and Methodological Innovations

RoundaboutHD explicitly emphasizes the following challenges:

Nonlinear, circular vehicle trajectories dictated by intersection geometry.
Intense, prolonged occlusion from the central roundabout statue, as well as inter-vehicle occlusions.
Multiple entry and exit zones per camera view, necessitating sophisticated handover logic and spatio-temporal association across non-overlapping cameras.
Strong viewpoint changes complicating appearance-based ReID and tracking.
High vehicle similarity (e.g., color/model) on different roundabout arms, increasing the risk of ID switches.

To support multi-camera association, the dataset and supporting works utilize self-supervised spatial–temporal camera linking. The process involves zone discovery through clustering of tracklet start/end points, entry–exit pairing via maximum zone-pairing score, and transition time modeling using Gaussian kernel density estimation: $\hat p_{i,j}(t) = \frac{1}{L_{i,j} h} \sum_{\ell=1}^{L_{i,j}} \mathcal{K}\left(\frac{t - \tau_{i,j}^{(\ell)}}{h}\right)$ where $\mathcal{K}(u) = \frac{1}{\sqrt{2\pi}} \exp(-u^2/2)$ (Lin et al., 17 Nov 2025).

6. Access, Licensing, and Use Cases

RoundaboutHD is distributed as open-source under an MIT-style license. The dataset, annotations, calibration parameters, and evaluation scripts are provided within a public GitHub repository (https://github.com/siri-rouser/RoundaboutHD.git) (Lin et al., 11 Jul 2025). Preprocessing for common benchmarks follows standardization steps such as fine-tuning YOLOv11n detectors, ResNet-50 feature extraction, and constant-height geo-mapping; no additional data augmentation is performed except for occasional frame subsampling used to adjust computational workload (Lin et al., 17 Nov 2025).

Documented applications include:

Anomaly detection (e.g., illegal U-turns, wrong-way travel).
Traffic density and flow estimation (vehicle counts, spatiotemporal speed profiles).
Law enforcement support via suspect vehicle tracking across a network of non-overlapping urban cameras.
Vehicle fleet and environmental statistics leveraging make/model/color attributes.

By providing dense, attribute-rich, and geometrically calibrated data over diverse urban traffic scenarios, RoundaboutHD addresses the gap between controlled academic benchmarks and operationally realistic, city-scale deployments (Lin et al., 11 Jul 2025, Lin et al., 17 Nov 2025).