Papers
Topics
Authors
Recent
2000 character limit reached

NTU VIRAL: Multi-modal UAV Benchmark

Updated 14 October 2025
  • NTU VIRAL dataset is a multi-modal benchmark combining visual, inertial, lidar, and UWB sensor data from a DJI hexacopter, enabling precise SLAM research.
  • It incorporates rigorous sensor calibration and high-accuracy ground truth from a laser tracker, ensuring reliable sensor fusion in GPS-denied conditions.
  • The dataset spans diverse indoor and outdoor scenarios, offering actionable insights into UAV navigation and multi-sensor integration in complex environments.

The NTU VIRAL dataset is a multi-modal benchmark designed for the development and evaluation of autonomous aerial platforms, with particular emphasis on visual-inertial-lidar and ranging sensor fusion in complex environments. Collected using a DJI M600 Pro hexacopter equipped with an extensive and rigorously calibrated sensor suite, NTU VIRAL uniquely combines stereoscopic vision, dual 3D lidar, high-frequency inertial measurements, ultra-wideband (UWB) radio ranging, and high-precision ground truth from an external laser tracker. The dataset encompasses a wide variety of flight scenarios in both indoor and outdoor settings on the NTU campus, aiming to fill the gap in aerial robotics data resources, especially for tightly coupled sensor fusion and SLAM research under GPS-denied and dynamic 3D operational conditions.

1. Sensor Suite and Platform Configuration

The DJI M600 Pro serves as the backbone for the NTU VIRAL data collection. Its on-board sensor payload comprises:

  • Inertial Measurement Unit (IMU): The primary inertial sensor is the VectorNav VN100, producing accelerometer, gyroscope, and magnetometer data at approximately 385 Hz. An on-board Extended Kalman Filter further fuses these measurements for orientation estimation.
  • 3D Lidars: Two Ouster OS1 Gen1, 16-channel lidars are mounted—one configured horizontally for 360° planar visibility, and the other vertically to capture the ground as well as frontal and rear perspectives. Each provides point clouds at 10 Hz and internal IMU data at 100 Hz.
  • Stereo Cameras: Two uEye 1221 LE global-shutter monochrome cameras in a front-facing stereo configuration, hardware-synchronized by an external trigger with inter-frame timestamp differentials under 3 ms, output 10 Hz image streams.
  • Ultra-Wideband (UWB) Ranging: Four UAV-mounted UWB ranging nodes (from two Humatics P440 radios, each with two antennae) interact with three stationary anchors, forming 12 ranging pairs. This system delivers range measurements at an effective rate around 68.571 Hz for UAV-anchor pairs, and 5.714 Hz for inter-anchor links.
  • Ground Truth Tracker: A Leica Nova MS60 MultiStation laser tracker continuously localizes a crystal prism attached atop the UAV, providing high-precision positional reference, with the prism offset (~0.4 m) from the UAV body frame determined and documented during calibration.

A summary of NTU VIRAL's comprehensive sensor complement serves to highlight its alignment with, yet clear extension beyond, typical ground-robotics datasets, introducing specific modalities (notably dual 3D lidar and high-frequency UWB ranging) that are uncommon for UAV benchmarks.

2. Data Collection Environments and Operational Scenarios

NTU VIRAL features datasets acquired under diverse and challenging environmental conditions:

  • EEE Sequences: Captured in an indoor carpark of the School of Electrical and Electronic Engineering, where dense structures and numerous features (trees, road markings, facades) challenge and enrich visual/lidar SLAM algorithm performance.
  • SBS Sequences: Acquired in an open square near the School of Biological Sciences; features include low-rise buildings, extensive glass surfaces, and far-off visual points that test depth estimation robustness.
  • NYA Sequences: Recorded in an auditorium setting, introducing unique difficulties such as low illumination, pronounced UWB ranging multi-path effects, semi-transparent surfaces posing issues for lidar, and complex aerial dynamics.

These scenarios are selected to enable assessment of multi-sensor SLAM and localization algorithms in both feature-rich and feature-sparse areas, and under varied lighting and reflective conditions. The operational settings are intended to reflect the distinctive 3D trajectories, abrupt maneuvers, and sensor-viewpoint disruptions characteristic of realistic aerial robotics deployments.

3. Calibration Methodologies

Calibration in NTU VIRAL is rigorously structured along two main axes: rigid and flexible sensor calibration.

  • Rigid Sensor Calibration:
    • Stereo Camera: Camera intrinsics (pinhole model with radial and tangential distortion) and stereo extrinsics are computed using standard chessboard patterns, minimizing reprojection errors and taking care to avoid large roll angles to leverage the wide (120° FOV) lenses.
    • Visual–Inertial Calibration: The Kalibr toolbox is employed to refine both the intrinsics and the extrinsic transformation matrix between the cameras and the high-frequency IMU. IMU bias and a temporal offset (typically around 20 ms) between camera and IMU are identified and recorded.

A frequently used transformation for mapping a point Cp^C\mathbf{p} in the camera frame to the body frame is:

Bp=BCRCp+Bt{}^B\mathbf{p} = {}_B^C\mathbf{R} \cdot {}^C\mathbf{p} + {}^B\mathbf{t}

where BCR{}_B^C\mathbf{R} is the rotation matrix and Bt{}^B\mathbf{t} the translation vector from the camera to body frame.

  • Flexible Sensor Calibration and Ground Truth Alignment:
    • UWB nodes and the ground truth crystal prism, subject to flexible or slightly compliant mounting, are calibrated initially via a VICON system and visual cues; dynamic displacement (up to ~2 cm) is measured and deterministically compensated.
    • Hand–eye calibration is utilized to resolve the geometric offset between the UAV's body frame and the ground truth prism. Trajectories are aligned with ground truth using resampling and closed-form least-squares (Umeyama method).

Calibration files are provided (in formats such as YAML) for all sensors, explicitly recording the relevant parameters for multi-sensor fusion and accurate evaluation.

4. Dataset Structure, Ground Truth, and Accessibility

All NTU VIRAL data—including sensor streams, calibration parameters, and support scripts—is distributed in ROS bag format. The dataset incorporates:

  • Synchronized raw and processed data streams for IMU, lidars, stereo images, UWB ranges, and laser tracker-based ground truth.
  • Calibration files for camera intrinsics, extrinsics (with body frame transformations), and IMU parameters.
  • Accessory evaluation and utility scripts, including point cloud deskewing, UWB message parsing, and trajectory alignment tools (with resampling and Umeyama registration).

Ground truth is delivered with positional accuracy commensurate with the Leica MS60 system. A necessary transformation is supplied to account for the prism's geometric offset relative to the UAV body frame. All resources are accessible at https://ntu-aris.github.io/ntu_viral_dataset/.

5. Research Applications and Benchmarking Use Cases

The NTU VIRAL dataset is primarily positioned as a testbed for:

  • UAV SLAM and Localization: Evaluation and development of real-time, tightly-coupled SLAM algorithms leveraging combinations of visual, inertial, lidar, and radio-ranging inputs in environments where GPS signals are unavailable or degraded.
  • Multi-modal Sensor Fusion: Study of high-rate visual–inertial–lidar integration under nonholonomic, full-3D aerial maneuvers requiring robust data association and drift suppression.
  • UWB-enhanced Navigation: Incorporation of dense UWB radio distance measurements to mitigate drift in visual or lidar odometry, especially in visually or geometrically degenerate scenarios.
  • Calibration and Dynamic Flight Control: Research into automated multi-sensor calibration techniques and the development of control algorithms that can exploit the rich sensor suite for high-agility flight.

The dataset explicitly supports benchmarking of algorithms across sensing modalities, and offers realistic, variable conditions resembling deployment environments for next-generation UAV systems.

6. Distinctive Features and Comparative Significance

The NTU VIRAL dataset is distinguished within the aerial robotics research landscape by:

  • Its combination of dual 3D lidars and high-frame-rate monochrome stereo imaging, which is more typical of autonomous car datasets than existing UAV resources.
  • The inclusion of UWB ranging measurements, tightly synchronized and spanning multiple UAV-to-anchor and anchor-anchor pairs, affording a unique channel for investigating radio-based drift correction and collaborative localization.
  • The use of a high-accuracy laser tracker for positional ground truth, supporting rigorous quantitative evaluation not typically feasible with GPS or consumer-grade reference systems, especially in indoor domains.
  • Systematic provision of multi-level calibration supporting both tightly integrated and flexibly mounted sensors, and explicit alignment procedures for ground truth and sensor coordinate frames.

These features render the NTU VIRAL dataset a comprehensive and challenging benchmark for autonomous aerial system research, with particular utility for studies in sensor fusion, SLAM, calibration, and robust flight control in diverse and adversarial operational contexts.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to NTU VIRAL Dataset.