Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning (2003.02437v2)

Published 5 Mar 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. It empowers smart city traffic management and disaster rescue. Researchers have made mount of efforts in this area and achieved considerable progress. Nevertheless, it is still a challenge when the objects are hard to distinguish, especially in low light conditions. To tackle this problem, we construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night. Due to the great gap between RGB and infrared images, cross-modal images provide both effective information and redundant information. To address this dilemma, we further propose an uncertainty-aware cross-modality vehicle detection (UA-CMDet) framework to extract complementary information from cross-modal images, which can significantly improve the detection performance in low light conditions. An uncertainty-aware module (UAM) is designed to quantify the uncertainty weights of each modality, which is calculated by the cross-modal Intersection over Union (IoU) and the RGB illumination value. Furthermore, we design an illumination-aware cross-modal non-maximum suppression algorithm to better integrate the modal-specific information in the inference phase. Extensive experiments on the DroneVehicle dataset demonstrate the flexibility and effectiveness of the proposed method for crossmodality vehicle detection. The dataset can be download from https://github.com/VisDrone/DroneVehicle.

Drone-Based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Overview

The paper "Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning" addresses the challenges of detecting vehicles in aerial images captured by drones, particularly under varying lighting conditions. The authors present a large-scale dataset, named DroneVehicle, which includes RGB-Infrared image pairs, and introduce a novel framework, the Uncertainty-Aware Cross-Modality Detector (UA-CMDet), to enhance vehicle detection performance in such images.

Key Contributions

  1. DroneVehicle Dataset: The dataset comprises 28,439 RGB-Infrared image pairs, with a total of 953,087 annotated objects across categories like cars, trucks, and buses. It spans diverse environments such as urban roads and residential areas, under various lighting conditions from day to night. This dataset is significant in that it is the first large-scale full-time drone-based RGB-Infrared cross-modality dataset.
  2. Uncertainty-Aware Cross-Modality Vehicle Detection (UA-CMDet): The framework exploits the strengths of both RGB and infrared modalities to improve detection accuracy. A central element of the framework is the Uncertainty-Aware Module (UAM), which assigns uncertainty weights to each modality based on ground-truth IoU and RGB illumination, allowing the model to prioritize more reliable information.
  3. Cross-Modal Fusion and Illumination-Aware Non-Maximum Suppression: The UA-CMDet integrates the features from both modalities using a cross-modal fusion and introduces an Illumination-Aware NMS strategy to optimally combine outputs from different branches during inference. These innovations aim to address challenges such as pixel misalignment and redundant information across modalities.

Experimental Insights

Extensive experiments on the DroneVehicle dataset demonstrate the efficacy of the proposed method. Notably, UA-CMDet exhibits significant improvements over single-modality baselines. For example, it achieves a higher mean Average Precision (mAP) of 64.01%, outperforming traditional object detectors like RoITransformer, which achieved 47.91% on RGB images.

Significant improvements were also observed in detecting various vehicle categories under low-light conditions, highlighting the effectiveness of cross-modality feature fusion and uncertainty quantification in enhancing detection reliability.

Implications and Future Directions

The introduction of the DroneVehicle dataset fills a critical gap in vehicle detection research, particularly for smart city applications where continuous and reliable monitoring is essential. The adoption of uncertainty-aware learning enhances the robustness of detection systems against the few-period challenges of lighting and environmental variability.

The research holds potential implications for developing AI systems that integrate multimodal sensors, particularly in applications such as traffic management and automated disaster response. Future exploration could focus on addressing the long-tail problem inherent in the dataset's data distribution and further enhancing the cross-modality learning framework to leverage additional sensory data and improve generalization across diverse environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yiming Sun (41 papers)
  2. Bing Cao (23 papers)
  3. Pengfei Zhu (76 papers)
  4. Qinghua Hu (83 papers)
Citations (181)
X Twitter Logo Streamline Icon: https://streamlinehq.com