USVTrack Dataset

Updated 1 July 2025

USVTrack is a specialized benchmark dataset for multi-object tracking in inland waterways, featuring synchronized 4D radar and camera data for autonomous USV navigation research.
The dataset includes synchronized 4D radar point clouds, camera images, GPS, IMU, and detailed object annotations collected across diverse waterways and challenging environmental conditions.
It enables research in robust multi-object tracking algorithms for USVs by providing a benchmark for radar-camera fusion and demonstrating improved tracking accuracy using its RCM method in challenging aquatic scenes.

USVTrack is a specialized benchmark dataset for multi-object tracking in inland waterways, uniquely focused on the fusion of 4D radar and camera modalities for robust autonomous navigation. Developed with the aim of supporting advanced perception systems in challenging waterborne environments, it offers extensive, multi-sensor time-synchronized data reflecting the operational conditions faced by unmanned surface vehicles (USVs) engaged in waterborne transportation, monitoring, and rescue applications.

1. Dataset Architecture and Data Modalities

USVTrack centers on a USV platform equipped with the following synchronized sensors:

4D Radar: Utilizes the Oculii EAGLE 77GHz sensor, capturing dense point clouds with range, azimuth, elevation, and Doppler velocity information at 15 Hz. The radar achieves high angular resolution (<1° in both azimuth and elevation).
Monocular Camera: Delivers RGB imagery at 1920×1080 resolution and 30 Hz.
GPS: Provides geospatial localization stamped per frame.
IMU: Outputs synchronized inertial (acceleration, angular velocity) data per frame.

For each timestamp $t$ , the dataset aligns:

Camera image: $\mathcal{I}_t \in \mathbb{R}^{1080 \times 1920 \times 3}$
Radar point cloud: $\mathcal{R}_t = \{ (x_i, y_i, z_i, v_i, p_i) \}_{i=1}^{N_t}$ , where $v_i$ is Doppler velocity, $p_i$ is reflection power.
Annotations for 2D bounding boxes, object class (ship, boat, vessel), tracking ID, and associated radar clusters.
Supplemental GPS and IMU readings ( $\mathcal{G}_t, \mathcal{M}_t$ ) per frame.

Overall, USVTrack comprises 68,822 annotated camera frames and 45,091 radar frames, each frame labeled with detailed metadata and unique object identifiers. Radar clusters are projected into the camera plane via calibration matrices to facilitate cross-modal association.

2. Operational Scenarios and Environmental Diversity

USVTrack is constructed to capture the complexity of inland waterways, reflecting the demands placed on navigation perception systems. This is achieved by recording across a range of dynamic and environmental variables:

Waterway Types: Includes wide/narrow rivers, lakes, canals, docks, and moats to ensure scene diversity.
Lighting Conditions: Spans bright daylight, low-light, and nighttime conditions, incorporating transition periods and abrupt changes in illumination.
Weather Conditions: Data capture occurs under sunny, overcast, rainy, foggy, and snowy scenarios, with particular inclusion of adverse conditions hazardous to optical sensors.
Seasonal Variation: Covers all four seasons, ensuring the dataset reflects fog, water vapor, and seasonal shifts affecting both appearance and sensor performance.
Object Motions: Contains both static and dynamic vessel interactions. Motion diversity is further reflected in oscillating, swaying, and occlusion-rich aquatic behaviors.

The combined environmental variability is intended to support the development and evaluation of algorithms for robustness to occlusion, light variation, and weather-specific sensor degradation.

3. Radar-Camera Fusion and Radar-Camera Matching (RCM) Method

At the core of USVTrack's experimental protocol is a practical cross-modal data association method termed Radar-Camera Matching (RCM), designed to improve multi-object tracking reliability under challenging conditions.

RCM Workflow:

Radar Processing: Raw radar point clouds are processed using DBSCAN clustering to aggregate returns from individual objects and suppress background noise.
Tracker Integration: RCM is designed to insert into popular two-stage association trackers. In the first stage, high-confidence camera detections are matched with new/existing tracks. In the second stage, RCM replaces or augments IoU-based association for low-confidence or unassigned detections.
Association Cost Calculation:

$C_{i, j}^{rcm} = \begin{cases} 1, & (\hat{D}_{radar} > \theta_{rcm}) \text{ or } (D_{iou} > \theta_{iou}) \ \alpha \cdot \hat{D}_{radar} + (1 - \alpha) \cdot D_{iou}, & \text{otherwise} \end{cases}$

Where $D_{iou}$ is the IoU distance between bounding boxes, $\hat{D}_{radar} = 1 - e^{-\lambda D_{radar}}$ is the normalized Mahalanobis distance on radar-derived kinematic features, and $\alpha$ is the radar–vision fusion weight.

Significance: RCM enables the use of radar's robustness to occlusion, lighting, and weather for re-identification and association, mitigating issues arising from visual degradation and frequent occlusions that are endemic to waterborne environments.

4. Benchmarking and Performance Outcomes

Detection evaluation: Camera-based detectors such as YOLOv9 and its successors (YOLOv11/12) demonstrate strong baseline performance with over 90% mAP across categories. Radar–camera fusion approaches, such as RCDetect, also provide competitive detection, though specific metrics are dataset- and protocol-dependent.

Tracking evaluation: The effect of RCM is demonstrated by benchmarking multiple state-of-the-art multi-object trackers (Deep OC-SORT, ByteTrack, Hybrid-SORT, BoT-SORT) with and without radar-cued association:

Metrics: Higher Order Tracking Accuracy (HOTA), Multiple Object Tracking Accuracy (MOTA), IDF1, and identity switches (IDSW).
Gains: Integration of RCM results in an increase in HOTA (e.g., Hybrid-SORT HOTA: 47.208 → 48.302), a reduction in ID switches (e.g., Hybrid-SORT IDSW: 207 → 186), and corresponding improvements across MOTA and IDF1.
Qualitative findings: RCM sustains track continuity through visually challenging events (e.g., occlusion under fog, nightfall, or water spray), reducing error cases driven by vision-only tracking.

5. Applications and Broader Implications

USVTrack addresses requirements in autonomous USV navigation, with broader relevance for:

Waterborne Autonomous Navigation: Reliable detection and tracking for vessel avoidance, route planning, and real-time situational awareness.
Maritime Safety and Environmental Monitoring: Supports safety-critical capabilities under conditions where sole reliance on vision is inadequate.
Traffic Management and Logistics: Enables automated monitoring, traffic flow optimization, and coordinated control in urban waterway networks.

A plausible implication is that the sensor diversity and environmental coverage of USVTrack also aid in developing perception systems robust to multi-modal sensor fusion, critical for next-generation sustainable waterborne transport and surface robotics.

6. Dataset Availability and Structure

USVTrack is publicly accessible at https://usvtrack.github.io. The dataset provides:

68,822 annotated camera frames (RGB, 1920×1080, 30 Hz)
45,091 synchronized radar point clouds (dense-range, azimuth, elevation, Doppler)
Object-level annotations (bounding boxes, class, tracking IDs)
Cross-modal (radar–camera) association data
Per-frame GPS and IMU records
Calibration files and alignment matrices for sensor fusion tasks

Sample Data Table:

Modality	Frames	Objects	IDs	Avg. Radar Points	Avg. Power (dB)	Avg. Velocity (m/s)
Camera	68,822	85,229	98	—	—	—
Radar	45,091	44,271	89	57.22	13.36	1.32

Categories: Ship, Boat, Vessel.

7. Significance and Prospects

USVTrack represents a foundational benchmark for vision–radar fusion tracking in aquatic environments, enabling realistic and rigorous evaluation of autonomous system performance in the context of inland waterways. Its multi-sensor, multi-condition design, and plug-and-play radar–camera matching method provide a basis for future research in perception algorithms, cross-modal tracking, and safe, reliable autonomous waterborne operations.

PDF Markdown Chat (Upgrade)