- The paper introduces a multi-sensor fusion system integrating visible, thermal, and acoustic data to improve real-time drone detection accuracy.
- It employs YOLO v2 for visual detection and MFCC with LSTM classifiers for acoustic analysis to achieve high precision under varying conditions.
- Evaluation demonstrates that sensor fusion reduces false positives and enhances F1-scores, supporting robust security and safety applications.
Real-Time Drone Detection and Tracking With Visible, Thermal, and Acoustic Sensors
Introduction
The proliferation of drones, or UAVs, poses unique challenges and opportunities across various domains. The paper, "Real-Time Drone Detection and Tracking With Visible, Thermal, and Acoustic Sensors" explores an advanced system for the automatic detection and tracking of drones. This multi-sensor system incorporates visible, thermal infrared, and acoustic sensors, addressing the need for robust and efficient detection mechanisms with the potential applications ranging from security enforcement to civilian safety.
System Architecture and Methodology
System Design
The detection system is designed to integrate multiple sensor modalities to improve detection accuracy and reduce false alarms. As outlined in the paper, the system employs a thermal infrared camera, a conventional visible spectrum camera, and an audio sensor. The system is enhanced by employing sensor fusion, which aggregates input from different sensors to produce a more reliable detection outcome. The architecture is illustrated with Figure 1 showing the hardware and software components, underscoring the system's portability and adaptability for various operational scenarios.
Figure 1: Left: system architecture with hardware and software parts. Center: main hardware parts of the detection system. Right: the system deployed just north of the runway at Halmstad airport (IATA/ICAO code: HAD/ESMT).
The system incorporates a pan/tilt platform that allows for dynamic tracking based on sensor input, providing coverage over a large area while maintaining the focus necessary to differentiate drones from other objects. The architecture is designed to support real-time processing, crucial for applications requiring immediate response.
Sensor Modalities
Thermal Infrared and Visible Cameras: The core of the detection capability comes from using thermal and visible cameras. These sensors operate under different lighting and environmental conditions, thus complementing each other's limitations. The system leverages YOLO v2 for real-time object detection across captured frames. The integration of thermal infrared technology is particularly noteworthy, as it allows for effective nighttime operations, addressing gaps commonly present when relying solely on visible spectrum cameras.
Acoustic Detection: Acoustic sensing plays a vital role in detecting drones, offering a unique angle by leveraging distinct sound signatures produced by drone motors. The system utilizes Mel Frequency Cepstrum Coefficients (MFCC) coupled with LSTM classifiers to effectively distinguish between drone sounds and other potential noise, with demonstrated precision and recall rates significantly improving detection reliability.

Figure 2: Two examples of sensor fusion results.
Results and Analysis
The study details an extensive evaluation of the system's performance through precision, recall, and F1-score metrics across individual sensors and their fused outputs. Notably, the infrared sensor achieves an F1-score of 0.7601, closely matched by the visible camera at 0.7849. The audio classifier outperforms these with an F1-score of 0.9323. These results confirm the efficacy of employing a multi-modal sensing approach in comparison to relying on a single modality.
Figures and evaluations validate the system's robustness across varying environmental conditions and distances. Furthermore, Figure 3 presents a frame-by-frame analysis of detection during active evaluation sessions, highlighting the system's real-time performance and its reduction of false positives through sensor fusion.



Figure 3: Frame-by-frame analysis of drone detection during one evaluation session.
Implications and Future Developments
The implications of this research are profound in enhancing the reliability of drone detection systems for both civilian and security applications. The innovative use of sensor fusion to integrate thermal, visual, and acoustic data offers a more nuanced understanding of the aerial environment, minimizing both false positives and negatives. This approach could be further expanded by incorporating additional sensor types, such as radar or RF detection, to enhance the system's capabilities.
Future work can explore improvements in algorithmic efficiency for real-time processing, enhancements in classifier robustness across diverse operational scenarios, and expansion of the dataset to include more drone models and flight behaviors, thus extending the system's applicability to a broader range of environments.
Conclusion
Overall, the paper provides a comprehensive analysis and validation of a multi-sensor system for effective drone detection. By harnessing the complementary strengths of visible, thermal, and acoustic sensing, and employing sensor fusion techniques, the study sets the foundation for future advancements in drone detection technologies, offering a pathway for developing sophisticated systems capable of tackling the evolving challenges in UAV management and security.
Figure 4: False detections appearing in a ten minutes long section of screen recording from an evaluation session, including the type of object causing the false detection.