Papers
Topics
Authors
Recent
Search
2000 character limit reached

BirDrone Dataset for Aerial Object Detection

Updated 1 June 2026
  • BirDrone is a large-scale benchmark dataset for drone and bird detection with precise COCO-format annotations focusing on small-object challenges.
  • It includes 11,495 images sourced from stills and video frames, ensuring diverse environmental conditions and balanced train/val/test splits for realistic surveillance applications.
  • The dataset underpins advanced models like YOLOBirDrone, which utilize innovative attention modules and adaptive layer aggregation to achieve high detection accuracy and reduced inference times.

The BirDrone dataset is a large-scale benchmark specifically curated for the reliable detection and classification of drones versus birds in aerial surveillance imagery. It is engineered to address the significant operational challenge of distinguishing drones from visually similar bird objects, particularly under conditions characterized by small object size, low contrast, and diverse environmental noise. By providing rigorously annotated COCO-format data emphasizing small and ambiguously shaped objects, BirDrone offers a challenging testbed for developing and evaluating vision-based detection algorithms, especially those targeting real-world surveillance use cases[[2601.08319](/papers/2601.08319)].

1. Dataset Composition and Structure

BirDrone comprises 11,495 images, drawn from both self-captured still images (8,428) and video frames (3,067) manually extracted from four public challenge videos. It contains two annotated categories: "drone" and "bird," with a total of 29,748 object bounding boxes (13,881 drones; 15,867 birds). Object density averages approximately 2.59 objects per image.

A significant emphasis is placed on object scale. Annotations are categorized into four bins according to pixel area:

Size Bin Area Range (px²) Instances
Extremely small < 400 1,129
Small 400–1,024 1,576
Medium 1,024–9,216 12,510
Large > 9,216 14,553

The smallest drone annotated is 7×5 pixels, and the smallest bird is 6×7 pixels, reflecting the dataset's focus on challenging, small-object detection. Images originate from variable resolutions and aspect ratios (portrait and landscape), uniformly preprocessed to 640×640 pixels for model comparison. The train/val/test split is 70%/20%/10% (8,046/2,299/1,150 images), balanced to prevent domain or modality bias.

2. Data Acquisition and Annotation Protocol

Data acquisition was conducted outdoors in Chandigarh’s "Green Zone," capturing scenes under variable daylight and overcast conditions. Devices include consumer-grade webcams and smartphones, both mounted on static tripods and handheld at various angles, thereby introducing real-world imaging imperfections such as motion blur and variable focus.

Annotation employs manual delineation of bounding boxes using COCO notation (xmin,ymin,width,height)(x_{min}, y_{min}, \text{width}, \text{height}) in JSON format. Quality assurance is enforced through a two-stage protocol: initial labeling by trained annotators, followed by peer review to eliminate inconsistencies and refine box precision. The partitioning of images ensures even distribution of stills and video frames in each subset, rigorously excluding near-duplicate content across splits.

3. Design Rationale and Dataset Challenges

BirDrone's primary design principle is maximizing challenge for deep learning-based detection architectures, thereby advancing the state of the art in aerial object identification. Over 9% of objects are annotated below 32×3232 \times 32 pixels, and almost 4% are below 20×2020 \times 20 pixels, amplifying the small-object detection difficulty.

The dataset explicitly introduces visual confounders: birds and drones are presented at visually similar scales and profiles, often partially occluded (15–20% of instances) by foliage or affected by camera noise. Object locations are uniformly distributed, including near frame edges, discouraging reliance on central-position priors. The diversity of capture devices and environmental backgrounds simulates noisy, heterogeneous surveillance network conditions, exposing vision models to a realistic operational envelope.

4. Evaluation Protocol and Baseline Performance

Evaluation follows the standard COCO metrics: precision (P), recall (R), [email protected] (mAP⁰·⁵), AP averaged over IoU thresholds 0.5 to 0.95 (mAP⁰·⁵–⁰·⁹⁵), with

mAP=1Ni=1NAPi\text{mAP} = \frac{1}{N} \sum_{i=1}^N \text{AP}_i

Detection accuracy is computed as 100×100 \times (True Positives / total ground-truth instances). BirDrone's benchmark results include:

Model mAP⁰·⁵ mAP⁰·⁵–⁰·⁹⁵ Precision Recall Accuracy (%) FN (%) FP (%) Inference (s/frame)
YOLOv9 (M1) 0.940 0.644 0.929 0.907 81.73 13.21 5.04
YOLOBirDrone 0.948 0.668 0.949 0.917 84.91 11.61 3.73 0.149
YOLOv8 0.947 0.661 81.82
RT-DETRv2 0.938 0.633 80.24

YOLOBirDrone achieves the top performance across primary detection metrics and demonstrates the lowest inference time among evaluated models. This suggests its architectural modifications—including adaptive and extended layer aggregation (AELAN), multi-scale progressive dual attention module (MPDA), and reverse MPDA (RMPDA)—enable enhanced discriminative feature learning for small and ambiguous target classes.

5. Use Cases and Practical Applications

BirDrone addresses mission-critical surveillance tasks where robust differentiation of drones and birds is essential, such as urban airspace monitoring, perimeter security, and protected-area enforcement. Its focus on small, partially occluded, and low-contrast objects under heterogeneous imaging conditions targets the principal pain points observed in real-world deployment contexts.

The clear split protocol, standardized COCO-format annotations, and diversity of image sources position BirDrone as a turnkey resource for developing, fine-tuning, and benchmarking detection architectures under operationally realistic constraints. It is immediately applicable for both supervised training and comparative validation of object detection models.

Prior to BirDrone, public benchmarks for drone-versus-bird discrimination primarily consisted of video-sequence-based datasets such as DvsB-Vid, derived from the IEEE AVSS Drone-vs-Bird Detection Challenge (Akyon et al., 2022). While DvsB-Vid focuses on temporal sequence classification—leveraging cues such as wing motion for discrimination and enabling benchmarking of 3D CNNs, LSTMs, and Transformer-based architectures—BirDrone offers a large-scale still-image corpus specifically emphasizing small, visually confounded, single-frame detections.

Compared to prior collections, BirDrone uniquely characterizes object scale distribution, occlusion statistics, and the prevalence of edge-located targets. Whereas DvsB-Vid emphasizes track-level video crops for sequence models, BirDrone prioritizes still-frame detection and per-box annotation fidelity.

7. Availability and Usage Considerations

BirDrone is publicly released alongside the YOLOBirDrone architecture and associated benchmarks, with annotations and splits in standard COCO JSON enabling immediate integration into established detection pipelines (Kaur et al., 13 Jan 2026). The dataset design and documentation support reproducible research and model evaluation in the bird-versus-drone detection task space.

This editorial summary is based exclusively on the data and results reported in (Kaur et al., 13 Jan 2026).

[[2601.08319](/papers/2601.08319)]: "YOLOBirDrone: Dataset for Bird vs Drone Detection and Classification and a YOLO based enhanced learning architecture" (Kaur et al., 13 Jan 2026) [[2207.10409](/papers/2207.10409)]: "Sequence Models for Drone vs Bird Classification" (Akyon et al., 2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BirDrone Dataset.