DroneAudioset: Audio Dataset for SAR

Updated 24 October 2025

DroneAudioset is a comprehensive audio dataset systematically collecting multi-channel recordings from various drone types and environments for search and rescue research.
It provides detailed annotations and recordings of human vocalizations and environmental sounds under extreme signal-to-noise conditions to benchmark noise suppression and classification methods.
The dataset facilitates research on hardware-software co-design by offering diverse drone configurations, microphone placements, and controlled acoustic environments.

DroneAudioset is a comprehensive, systematically annotated audio dataset designed for drone-based search and rescue research, focused on enabling and benchmarking audition systems capable of detecting human presence in extreme noise environments. DroneAudioset addresses key limitations in prior datasets, such as restricted diversity, reliance on synthetic mixtures, and non-standardized recording setups, by providing 23.5 hours of multi-channel audio collected under wide-ranging signal-to-noise ratio (SNR) conditions, multiple drone configurations, and controlled environments (Gupta et al., 17 Oct 2025).

1. Dataset Structure and Specifications

DroneAudioset comprises three primary categories of recordings:

Drone Noise + Sound Source Recordings (~15 hours): These sessions capture human vocalizations, non-vocal human cues (e.g., clapping, footsteps), and ambient environmental sounds (fire, water dripping, etc.) while drones are operating.
Drone-Only Recordings (~2.3 hours): Pure ego-noise from drones with no concurrent target sound, essential for noise modeling.
Source-Only Recordings (~6.2 hours): Sounds of interest absent drone-induced noise, establishing reference signals.

The dataset provides a SNR range from –57.2 dB to –2.5 dB, calculated as:

$\mathrm{SNR}_{\mathrm{dB}} = 20 \log_{10} \left( \frac{\mathrm{RMS}_{\text{source}}}{\mathrm{RMS}_{\text{drone}}} \right)$

Key diversity dimensions are supported:

Drone types: DJI F450 (larger quadcopter) and DJI F330 (smaller quadcopter), with differences in wheelbase, weight, propeller count, and acoustic profile.
Throttle settings: “Low” and “high” throttle, yielding ~15 dBA SPL difference at 1 m microphone distance.
Microphone configurations: Seventeen microphones organized into two circular eight-channel arrays above and below the drone plus a central standalone mic. Array placements at 25 cm and 50 cm from the drone explore wind turbulence and directivity effects.
Environments: Recordings in a small conference room and two large multi-purpose halls, providing varied reverberation and multipath characteristics.

2. Technical Challenges in Drone Audition

Drone-generated ego-noise presents a fundamental barrier for acoustically detecting human presence. SPLs commonly exceed 80 dBA at 1 m; broadband and tonal components (harmonics and fundamental, modulated by rotor speed and throttle) typically mask the desired signals, driving the vast majority of recordings below –10 dB SNR. The spatial non-uniformity of this noise is further complicated by aerodynamic effects—microphones below the drone are more affected by downwash, while those above may benefit from reduced turbulence. Rotational speed variability modulates both tonal and broadband segments, producing widely fluctuating acoustic signatures.

3. Applications: Human-Presence Detection in SAR

DroneAudioset facilitates systematic development and evaluation of advanced noise suppression and classification methods for search and rescue. Notable benchmarked approaches include:

Noise Suppression: Classic MVDR beamforming (location-aware), spectral gating, neural network enhancement (MPSENet), and hybrid beamforming-ML solutions.
Audio Classification: Post-enhanced signals are classified with transformer architectures (SSLAM) to infer the presence of human vocal/non-vocal signals or ambient events.

Noise suppression efficacy is quantified by scale-invariant signal-to-distortion ratio (SI-SDR):

$\mathrm{SI-SDR} = 10\log_{10} \left( \frac{|\alpha \mathbf{x}|^2}{|\alpha \mathbf{x} - \hat{\mathbf{x}}|^2} \right),\quad \alpha = \frac{\hat{\mathbf{x}}^T \mathbf{x}}{\|\mathbf{x}\|^2}$

Classification reliability is measured by per-class F1-scores, showing the impact of effective noise suppression on human-presence detection performance.

4. Experimental Insights for System Design

DroneAudioset provides empirical context for critical hardware and system design choices:

Microphone Placement: Arrays mounted above the drone consistently achieve higher SI-SDR, being less affected by turbulent airflow, while those below are closer to ground-level sources but experience greater wind-induced degradation.
Array vs. Single-Channel: Multichannel arrays enable beamforming for directional noise suppression yet demand higher power and computational resources; single microphones may suffice depending on the SNR regime and algorithm sophistication.
Throttle Modulation: Lower throttle settings are correlated with improved SNR, suggesting feasibility for dynamically adjusting flight characteristics during SAR listening phases.
Drone Size: Larger drones (F450) exhibit distinct noise profiles compared to smaller platforms (F330), impacting system configuration and payload choices.

5. Benchmarking and Standardization

DroneAudioset uniquely establishes standardized, reproducible setups for drone audition research. Diversity in SNR, platform configuration, microphone arrangement, and acoustic environments supports benchmarking of both algorithmic and hardware system variants. Comprehensive annotations and metadata enable controlled experimentation and fair comparison across suppression, localization, and classification methodologies.

6. Future Implications for Drone Audition Systems

The dataset fills a long-standing gap by providing real-world, systematically varied and annotated recordings for robust audition system development. It supports immediate advances in SAR human-presence detection but also opens new directions for:

Sound source localization: Multichannel data and diverse SNR conditions facilitate algorithm training and evaluation for underdetermined localization in noisy aerial environments.
Speech and acoustic event recovery: The extreme noise regime enables the creation and benchmarking of speech enhancement and event detection methods specifically tailored for drone applications.
Hardware-software co-design: Observed trade-offs in array configuration, drone type, and throttle set points inform future UAV platform integration and sensor architecture.

A plausible implication is that standardized, real-world datasets like DroneAudioset are necessary prerequisites for credible evaluation and deployment of drone audition technologies in disaster response and other operational contexts.

7. Dataset Access and Community Impact

DroneAudioset is publicly available at [https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet/] under the MIT license, with complete documentation of recording protocols, setup diagrams, and data formats. This open availability—coupled with its systematic design—renders it a foundational benchmark for the drone audition and search and rescue research communities, streamlining the development and deployment of practical human-presence detection systems in extremely noisy aerial environments.

PDF Markdown Chat (Pro)

References (1)

DroneAudioset: An Audio Dataset for Drone-based Search and Rescue (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to DroneAudioset.