DUT Anti-UAV Dataset

Updated 31 December 2025

DUT Anti-UAV dataset is a curated benchmark for vision-based UAV detection and tracking with detailed manual annotations and varied scene conditions.
It is structured into dedicated detection and tracking subsets, enabling rigorous evaluation using standard metrics like IoU, precision, and mAP.
The dataset supports detection-fused tracking and open MIT licensing, fostering advancements in anti-UAV surveillance research.

The DUT Anti-UAV dataset is a curated benchmark designed for computer vision-based UAV detection and tracking, addressing the growing demand for effective anti-UAV surveillance strategies in complex visual environments. Developed by Dalian University of Technology, the dataset comprises visible-light imagery and video sequences with precise manual annotations, facilitating rigorous evaluation of state-of-the-art detection and tracking algorithms. It is structured into dedicated detection and tracking subsets and is released under the MIT license, enabling broad academic and commercial usage (Zhao et al., 2022).

1. Dataset Structure and Annotation Protocol

Detection Subset

The detection set contains 10,000 images partitioned into training (5,200 images; 5,243 UAV annotations), validation (2,600; 2,621), and test (2,200; 2,245) splits. Each instance of the sole “UAV” class is labeled using an axis-aligned bounding box ( $(x_{min}, y_{min}, w, h)$ ; units in pixels) and typically recorded as:

1	000123.jpg, x_min=345, y_min=120, w=48, h=30, class="uav"

Over 35 distinct UAV types are represented, spanning rotorcraft and fixed-wing models. Scene diversity includes urban (high-rises, parking lots), rural (fields, forests), varied weather (sunny, cloudy, snowy), and broad temporal coverage (dawn/dusk, day, night).

Quantitative image statistics:

UAV bounding box area ratio: $1.9\times 10^{-6}$ (min) to $0.70$ (max); mean $\approx 1.3\%$
Aspect ratio ( $w/h$ ): $1.0$ to $6.67$; mean $\approx 1.9$
UAV spatial distribution: centers denser near image centroid, otherwise uniform in $x$ / $y$

Tracking Subset

Composed of 20 fully annotated video sequences (total $\approx 24,\!804$ frames; $\approx 1,\!240$ frames/sequence), these span both short-term ( $<$ 500 frames) and long-term ( $>$ 2,000 frames) tracking scenarios. Each frame contains a single axis-aligned bounding box. Video resolutions are $720\times 1280$ or $1080\times 1920$ .

Key dynamics:

UAV scale (box/image area): per-frame ratio $2.7\times 10^{-4}$ to $4.5\%$
Aspect ratio per-frame: $1.0$ to $4.0$ (notably, Video 10 varies $1.0\to 4.33$ )
Presented challenges: extremely small targets, rapid motion, camera shake, illumination variation, occlusion, and out-of-view events

2. Evaluation Protocols and Metrics

Detection Evaluation

Standard object detection metrics are adopted:

Intersection over Union (IoU): $\mathrm{IoU}(B_{gt}, B_{pred}) = \frac{|B_{gt} \cap B_{pred}|}{|B_{gt} \cup B_{pred}|}$
True/False Positives (TP/FP), False Negatives (FN): TP if $\mathrm{IoU} \geq \tau$ and class match; FN if GT box unmatched
Precision and Recall: $P = \frac{TP}{TP + FP},\quad R = \frac{TP}{TP + FN}$
F₁-score: $\mathrm{F}_1 = 2\frac{P R}{P + R}$
Average Precision (AP): $\mathrm{AP}_{\tau} = \int_0^1 P(R)\, dR$ at IoU threshold $\tau$
mean AP (mAP): $\mathrm{mAP} = \frac{1}{K}\sum_{k=1}^K \mathrm{AP}_{\tau_k}$ ; $K=1, \tau_1=0.5$ in experiments

Tracking Evaluation (One-Pass Evaluation, OPE)

Trackers are initialized with the ground-truth in frame 1, run to sequence end with no re-initialization:

Success plot (overlap): Given $\mathrm{IoU}_t$ at frame $t$ , rate at threshold $\delta$ : $s(\delta) = (1/N)\cdot \#\{t:\mathrm{IoU}_t\geq\delta\}$
Overall success score: Area under $s(\delta)$ over $\delta \in [0, 1]$ : $\mathrm{Success}=\int_{0}^{1}s(\delta)\,d\delta$
Precision plot (center-error): $d_t =$ Euclidean center distance; precision at error $\epsilon$ : $P(\epsilon) = (1/N)\cdot \#\{t:d_t\leq\epsilon\}$
Normalized precision: AUC of $P(\epsilon)$ for $\epsilon\in[0,50]$ px, divided by $50$

3. Benchmark Algorithm Results

Detection Baselines

Method	Backbone	mAP	FPS
Faster-R-CNN	ResNet-50	0.653	12.8
Cascade-R-CNN	ResNet-50	0.683	10.7
ATSS	ResNet-50	0.642	13.3
SSD	VGG16	0.632	33.2
YOLOX	ResNet-18	0.400	53.7

Cascade-R-CNN delivers the highest mAP and recall across all precision levels.
YOLOX achieves maximal inference speed at the expense of detection accuracy.
Precision–Recall curves at IoU=0.5 and IoU=0.75 further substantiate method-specific trade-offs.

Tracking Baselines

Tracker	Success	NormPre	Pre@20px
SiamFC	0.381	0.526	0.623
ECO	0.404	0.643	0.717
SPLT	0.405	0.585	0.651
ATOM	0.574	0.758	0.830
SiamRPN++	0.545	0.709	0.780
DiMP	0.578	0.756	0.831
TransT	0.586	0.765	0.832
LTMU	0.608	0.783	0.858

LTMU demonstrates the highest success and precision scores among tested tracking approaches.

Detection-Fused Tracking

A detection-tracking fusion algorithm is specified: If tracker confidence $<\tau_t$ ( $\tau_t=0.9$ ), re-detect and select the bounding box with higher confidence (subject to detector score $>\tau_d=0.9$ ). Fusing with Faster-R-CNN (VGG16) yields the largest per-tracker improvements, e.g., SiamFC: 0.381 $\to$ 0.617 (+0.236 success), with similar improvements in precision.

The DUT Anti-UAV is distinct from the larger Anti-UAV dataset (Jiang et al., 2021), which provides 318 unaligned RGB–IR video pairs and over 585,900 bounding boxes for advanced multi-modal tracking evaluation. While DUT is focused exclusively on visible-light imagery and offers single-class annotations, Anti-UAV emphasizes modality fusion and attribute-rich annotation (seven binary attributes per IR clip: Out-of-View, Occlusion, Fast Motion, Scale Variation, Low Illumination, Thermal Cross-Over, Low Resolution). Evaluative protocols, error definitions, and baseline results differ accordingly.

Both datasets share object-centered bounding box annotation. However, Anti-UAV expands the research landscape toward multi-modal (RGB/IR), long-term, and attribute-intensive UAV tracking. DUT’s smaller, more focused structure facilitates benchmarking fine-grained detection and tracking in visually challenging, single-modality scenes.

5. Dataset Distribution and Licensing

The DUT Anti-UAV dataset is accessible via https://github.com/wangdongdut/DUT-Anti-UAV, with the following structure:

“detection” folder: Images and subfolder “Annotations” containing TXT/CSV files per image
“tracking” folder: 20 subfolders, each with sequential frames (.jpg) and groundtruth.txt
Example code for training/inference is provided under “scripts/”

The dataset is released under the MIT license, permitting unrestricted academic and commercial usage provided attribution is maintained (Zhao et al., 2022).

6. Implications and Future Research Directions

A plausible implication is that the DUT Anti-UAV dataset enables systematic study of visual anti-UAV detection and tracking in scenarios featuring small targets, broad environmental variation, and significant target appearance diversity. Its focused annotation protocol and detection-tracking fusion methodology advance high-precision tracking. Future research may leverage DUT for benchmarking detection-fused tracking, exploiting multi-modal fusion (as explored in Anti-UAV (Jiang et al., 2021)), and addressing challenges such as occlusion, scale variation, and out-of-view events via robust data augmentation or advanced model design.

The dataset’s open licensing and comprehensive structure support its use as a reference testbed for computer vision-based UAV surveillance systems, evaluation of detection/tracking architecture performance, and development of adaptive anti-UAV solutions.

PDF Markdown Chat (Pro)

References (2)

Vision-based Anti-UAV Detection and Tracking (2022)

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to DUT Anti-UAV Dataset.