The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking (1804.00518v1)

Published 26 Mar 2018 in cs.CV

Abstract: With the advantage of high mobility, Unmanned Aerial Vehicles (UAVs) are used to fuel numerous important applications in computer vision, delivering more efficiency and convenience than surveillance cameras with fixed camera angle, scale and view. However, very limited UAV datasets are proposed, and they focus only on a specific task such as visual tracking or object detection in relatively constrained scenarios. Consequently, it is of great importance to develop an unconstrained UAV benchmark to boost related researches. In this paper, we construct a new UAV benchmark focusing on complex scenarios with new level challenges. Selected from 10 hours raw videos, about 80,000 representative frames are fully annotated with bounding boxes as well as up to 14 kinds of attributes (e.g., weather condition, flying altitude, camera view, vehicle category, and occlusion) for three fundamental computer vision tasks: object detection, single object tracking, and multiple object tracking. Then, a detailed quantitative study is performed using most recent state-of-the-art algorithms for each task. Experimental results show that the current state-of-the-art methods perform relative worse on our dataset, due to the new challenges appeared in UAV based real scenes, e.g., high density, small object, and camera motion. To our knowledge, our work is the first time to explore such issues in unconstrained scenes comprehensively.

Authors (9)

Dawei Du (27 papers)
Yuankai Qi (46 papers)
Hongyang Yu (3 papers)
Yifan Yang (578 papers)
Kaiwen Duan (6 papers)
Guorong Li (36 papers)
Weigang Zhang (9 papers)
Qingming Huang (168 papers)
Qi Tian (314 papers)

Citations (581)

View on Semantic Scholar

Summary

Overview of "The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking"

The research paper "The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking" presents a comprehensive benchmark (UAVDT) for evaluating object detection and tracking algorithms using unmanned aerial vehicles (UAVs). This benchmark aims to address the limitations of existing datasets, which often rely on fixed or car-mounted cameras. By exploiting the high mobility and expansive viewing capabilities of UAVs, the UAVDT dataset introduces new challenges that are representative of real-world scenarios.

Dataset Construction

The UAVDT benchmark comprises 100 video sequences, selected from over 10 hours of footage captured in urban environments. These sequences include diverse scenes such as squares, highways, and junctions. Recorded at 30 fps with a resolution of 1080x540 pixels, the dataset emphasizes three foundational computer vision tasks: object detection (DET), single object tracking (SOT), and multiple object tracking (MOT). The dataset focuses primarily on vehicles rather than pedestrians, offering a unique perspective within the field.

Annotation and Attributes

A meticulous annotation process involving domain experts resulted in 80,000 frames being labelled, covering approximately 2,700 vehicles. The dataset incorporates various challenging attributes such as weather conditions, flying altitudes, and camera views. For SOT tasks, additional attributes include background clutter, camera rotation, and object blur. This attribute-rich approach facilitates evaluating algorithms across a broad spectrum of real-world conditions.

Evaluation of State-of-the-Art Methods

The paper performs extensive quantitative evaluations of state-of-the-art algorithms for each task:

Object Detection (DET): Region-based and region-free methods were assessed. Notably, R-FCN exhibited superior results among other methods, achieving an AP score significantly lower than performance on other benchmarks, indicating the dataset's complexity.
Multiple Object Tracking (MOT): Both online and batch processing methods were tested. MDP with Faster-RCNN detection showed the highest MOTA and IDF scores. However, the dense object scenes and UAV-induced motion challenges resulted in higher false positives and identity switches compared to other datasets.
Single Object Tracking (SOT): Deep learning approaches generally outperformed correlation filter methods. MDNet achieved the highest scores but at a substantially lower overlap score compared to other benchmarks, emphasizing the new tasks' difficulty posed by UAV perspectives.

Implications and Future Directions

The UAVDT benchmark heralds significant implications for computer vision, particularly in enhancing real-world object detection and tracking under UAV conditions. Key areas for future exploration include:

Real-time Algorithm Efficiency: Addressing computational constraints for real-time processing on UAV platforms, possibly through model compression or pruning techniques.
Scene-Aware Methods: Leveraging scene priors to improve algorithm robustness in varying environmental conditions.
Incorporating Motion Insights: Enhancing performance by integrating motion clues, which could be facilitated by advancements in models that utilize LSTM for trajectory prediction.
Small Object Detection: Developing methods focused on accurately detecting and tracking small objects given UAV altitude challenges.

Conclusion

The UAVDT benchmark represents a significant step forward in the evaluation of detection and tracking algorithms under realistic UAV conditions. By capturing complex scenarios and offering extensive annotation, this dataset opens new avenues for research and contributes to advancing UAV-based vision systems. Future research could expand this benchmark with additional sequences and tasks to further understand and address the challenges posed by UAV environments.

PDF Markdown

Related Papers

Find Related Papers