Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
55 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
98 tokens/sec
DeepSeek R1 via Azure Premium
86 tokens/sec
GPT OSS 120B via Groq Premium
463 tokens/sec
Kimi K2 via Groq Premium
200 tokens/sec
2000 character limit reached

DREGON UAV Detection Dataset

Updated 11 August 2025
  • DREGON dataset is a multi-sensor benchmark comprising synchronized infrared, visible, and acoustic recordings for UAV detection and identification in real-world conditions.
  • It features detailed annotations and distance categorization based on industry-standard criteria, enabling robust evaluation of detection and tracking performance.
  • The dataset supports the development of multi-modal sensor fusion and drone audition systems, improving algorithm performance even under challenging low SNR conditions.

The DREGON dataset is a multi-sensor benchmark developed for fundamental research in the automated detection of unmanned aerial vehicles (UAVs) and other airborne objects in real-world environments. It features synchronized infrared, visible, and acoustic recordings collected at Swedish airports, annotated extensively to facilitate robust algorithmic development. Designed to address limitations in prior drone detection studies—specifically the lack of datasets with explicit device specifications, multi-modal coverage, and range-dependent annotations—DREGON enables systematic investigation of detection, recognition, and identification tasks under varied operational distances and environmental conditions (Svanström et al., 2021).

1. Dataset Composition and Structure

DREGON includes three principal data modalities:

  • Infrared (IR) Videos: 365 ten-second clips, 320 × 256 resolution (FLIR Breach PTQ-136/Boson sensor), raw 16-bit greyscale (Y16), 60 FPS.
  • Visible Videos: 285 ten-second clips, 1280 × 720 resolution (Sony HDR-CX405 via HDMI/Elgato Cam Link 4K), YUY2-format (16 bpp), 50 FPS.
  • Audio Files: 90 ten-second .wav files at 44,100 Hz, covering drone, helicopter, and background sounds.

The videos total 203,328 annotated frames. The dataset captures three distinct UAVs—Hubsan H107D+, DJI Phantom 4 Pro, DJI Flame Wheel F450—and confounding objects (birds, airplanes, helicopters). Selected clips contain two UAVs flying simultaneously, increasing target ambiguity and realism.

All files adhere to a systematic naming convention indicating sensor, target class, and instance (e.g., IR_DRONE_001.mp4), paired with Matlab (.mat) label files. Annotation utilizes Matlab’s Video Labeller app, while accompanying metadata—such as distance category—is provided via Excel sheets.

2. Sensor Suite and Acquisition Platform

Recordings employ a coordinated sensor suite mounted on an automated pan/tilt platform (Servocity DDT-560H, Hitec HS-7955TG servos, Pololu Mini Maestro USB controller), steered by a Kalman filter tracker fed by a fish-eye camera:

  • Thermal IR (IRcam): 24° horizontal/19° vertical FoV, native 320 × 256 resolution.
  • Visible (Vcam): ~matching FoV to IRcam, HDMI output processed at 1280 × 720, YUY2-format.
  • Fish-eye (Fcam): 8 MP, 180° horizontal/90° vertical FoV, 1024 × 768 at 30 FPS; used for wide-area motion detection and platform guidance.
  • Audio: Cardioid Boya BY-MM1 microphone positioned to capture salient acoustic signatures.

All sensors are synchronized for concurrent recording, ensuring precise temporal alignment and supporting sensor fusion research. Computation and storage are managed with a robust hardware setup (Dell Latitude 5401 with Intel i7, Nvidia MX150).

3. Distance Categorization According to DRI/Johnson Criteria

A distinguishing feature of the DREGON dataset is the categorization of video clips by sensor-to-target distance according to industry-standard DRI (Detect, Recognize, Identify) requirements, themselves derived from the Johnson criteria. This classification enables systematic paper of performance as a function of target scale and range:

Category Pixel Width (IR) Typical Distance (Drone, 0.4 m width)
Close ≈ 15 pixels <20 m
Medium 5–15 pixels 20–60 m
Distant <5 pixels >60 m

These boundaries are adapted for each class (UAVs, birds, helicopters, airplanes) using the projected pixel width formula:

PixelWidthfcameraWobjectDtarget\textrm{PixelWidth} \approx \frac{f_\textrm{camera} \cdot W_\textrm{object}}{D_\textrm{target}}

where fcameraf_\textrm{camera} is focal length, WobjectW_\textrm{object} is physical width, and DtargetD_\textrm{target} is distance.

This approach enables quantifiable research on the limits of detection, recognition, and identification across both small and large airborne objects (Svanström et al., 2021).

4. Collection Sites and Operational Conditions

Data were acquired at three Swedish airports—Halmstad (HAD/ESMT), Gothenburg City (GSE/ESGP), and Malmö (MMX/ESMS)—under daylight, with maximum sensor-to-target range restricted to 200 m per UAV operational regulations. Environmental conditions vary across the dataset from clear sun to overcast and scattered clouds, reproducing real-world variability critical for robust system validation.

5. Research Applications: Detection, Tracking, and Sensor Fusion

The DREGON dataset is designed for use in development and benchmarking of multi-modal drone detection systems. It enables:

  • Evaluation and training of algorithms over IR, visible, and audio modalities.
  • Investigation of sensor fusion approaches in ambiguous scenarios, leveraging complementary strengths of thermal, visible, and acoustic signals.
  • Distance-dependent analysis, facilitated by the DRI-based categories.
  • Assessment of discriminative performance in presence of visual and acoustic confounders (e.g., birds or helicopters).

Annotation is provided via Matlab Ground-Truth objects, and metadata supports both detection/tracking tasks and the analysis of range-dependent recognition performance.

A plausible implication is that such comprehensive multi-modal data, particularly with detailed metadata and rigorous distance binning, can catalyze advances in real-time UAV detection for environments like airport surveillance and restricted airspace protection.

6. Integration with Source Localization and Speech Enhancement

The DREGON dataset’s noise recordings have been adopted in hybrid systems for drone audition, such as those combining Generalized Sidelobe Canceller (GSC) array signal processing with Deep Neural Networks (DF2) for source localization and enhancement at extremely low SNR conditions (–30 dB) (Wu et al., 8 Aug 2025). In such research:

  • 2,000 noisy samples from DREGON—each containing 4 s of drone noise with 2 s of inserted clean speech—are used for testing and validation.
  • The dataset is essential in fine-tuning the DF2 DNN to real-world drone noise characteristics.
  • System evaluation with DREGON reveals dSNR improvements up to 72 dB over baselines, with methods outperforming Dual-stage Multichannel Wiener Filtering (DMWF), time-frequency masked MWF, simple GSC, and end-to-end DF2 under SNR extremes.

Key mathematical formulations in these studies rely on STFT-domain signal models:

Xm(l,k)=Am(l,k)S(l,k)+Vm(l,k)X_m(l, k) = A_m(l, k) S(l, k) + V_m(l, k)

and adaptive filtering algorithms (RLS for GSC adaptation), underscoring the dataset’s utility in advancing both array processing and neural network-based speech enhancement under realistic operational conditions (Wu et al., 8 Aug 2025).

7. Technical Format and Accessibility

Videos are stored in .mp4 format, audio in standard .wav, and annotations in Matlab-compatible files. All meta-information relevant to analysis—categories, frame-level labels, acquisition conditions, and distance bins—are explicitly documented and aligned with best practices for reproducible research.

A plausible implication is that this structured approach to data curation and annotation offers extended utility across diverse research domains spanning computer vision, signal processing, and security engineering.


In summary, the DREGON dataset provides a uniquely comprehensive, multi-sensor, and rigorously annotated resource for research in UAV detection, multi-modal sensor fusion, and drone audition systems. Its design principles—explicit sensor parameters, inclusion of challenging confounders, and robust range-dependent annotation—distinguish it as an essential contribution to methodical evaluation and development in these technically demanding domains (Svanström et al., 2021, Wu et al., 8 Aug 2025).