Drone Dataset in Heavy Haze

Updated 19 November 2025

Drone datasets in heavy haze are curated benchmarks comprising synthetic and real aerial imagery with systematic annotations, designed to evaluate computer vision methods under adverse visibility.
The data incorporates advanced atmospheric scattering models and physical measurements, enabling realistic simulation of haze through per-frame transmittance and altitude metadata.
Evaluations indicate that integrating dehazing preprocessing and depth-conditioned detection techniques can significantly boost object detection accuracy in challenging haze conditions.

A drone dataset in heavy haze is a curated benchmark comprising aerial imagery acquired under significant atmospheric scattering conditions. Such datasets are essential for evaluating and advancing computer vision algorithms, particularly for object detection and scene understanding under adverse visibility. Two principal benchmarks—HazyDet (Feng et al., 2024) and A2I2-Haze (Narayanan et al., 2022)—exemplify the state-of-the-art in dataset design and methodology for heavy-haze, drone-view scenarios.

1. Dataset Composition and Ground Truth Structure

Drone heavy-haze datasets are characterized by both real and synthetic imagery, specialized annotation protocols, and systematic coverage of operational parameters.

HazyDet (Feng et al., 2024) aggregates 11,000 synthetic hazy images and 600 real-foggy images, totaling approximately 383,000 annotated object instances (roughly 365,000 synthetic, 19,296 real). The synthetic data is generated by applying a physically-based haze simulation to clear scenes, while the real component (designated RDDTS) consists of field-collected, naturally hazy drone imagery. Annotations include three vehicle classes—Car, Truck, Bus—with heavy class imbalance (Cars ≈ 80% of instances) and pronounced small-object prevalence (over 60% of car instances occupy less than 0.1% of the image area). Collection spans various flight altitudes (tens to hundreds of meters) and viewing angles (nadir to oblique) across urban, rural, and coastal contexts; resolution is fixed at 1333×800 pixels.

A2I2-Haze (Narayanan et al., 2022) provides 343 UAV-acquired hazy images (224 paired with haze-free references) and annotates ten classes, primarily vehicles and infrastructure (Sedan, Van, Pickup, UTV, Mannequin, UGV, Barrel, Jersey Barrier, Aluminum Truss, Backpack). Images are captured across 51 haze-levels (light, medium, heavy bins) using a custom VTOL (Deep Purple 3), a 2.1mm lens, and barometric altitude/control metadata. Annotations follow COCO (JSON) and PASCAL-VOC (XML) standards. Each bounding box is assigned a per-frame haze severity label and is associated with per-frame UAV altitude and haze-transmittance.

Benchmark	Images (hazy)	GT classes	Real haze	Synthetic haze	Per-image haze measure	UAV Altitude
HazyDet	11,600	3	600	11,000	Implicit (β sample)	10s–100s meters
A2I2-Haze (UAV)	343	10	343	0	Direct (transmiss.)	15–50 m (stepped)

2. Haze Simulation and Physical Modeling

Synthetic haze augmentation in drone datasets employs the single-image Atmospheric Scattering Model (ASM):

$I(x, y) = J(x, y) \cdot t(x, y) + A \cdot (1 - t(x, y))$

$t(x, y) = \exp(-\beta \cdot d(x, y))$

where $I(x, y)$ is the observed radiance, $J(x, y)$ is clear-scene radiance, $A$ is spatially-uniform atmospheric light, $\beta$ is the scattering coefficient, and $d(x, y)$ is scene depth along the viewing ray. In HazyDet, $A \sim \text{trunc-}\mathcal{N}(0.8,0.05)\vert_{[0.7,0.9]}$ , $\beta \sim \text{trunc-}\mathcal{N}(0.045,0.02)\vert_{[0.02,0.16]}$ , emulating realistic fog conditions.

A2I2-Haze does not generate synthetic haze; instead, it employs field-deployed M56E1 fog-oil smoke generators and uses synchronized laser transmissometers (625 nm) to provide per-frame transmission $t(x)$ . UAV altitude and scene relief provide $d(x)$ , enabling recovery of $\beta$ for each frame. Paired, haze-free and hazy images are obtained through repeated UAV survey missions, guaranteeing spatial-temporal alignment for comparative evaluation.

3. Depth Cues, Estimation, and Label Quality

Depth information is integral to haze synthesis and object detection enhancement.

HazyDet relies on pseudo-depth labels generated via a monocular depth estimator (Metric3D v2), selected after cross-comparison with VA-DepthNet, ZoeDepth, IEBins, and UniDepth. Selection criteria included FID/KID metrics and a 280-participant visual survey to confirm depth realism. No hardware depth (LiDAR/stereo) was utilized; all per-pixel depth is single-channel and aligned with the native image resolution. Pseudo-labels are recognized as noisy and are subsequently refined during training (see Sec. 5).

A2I2-Haze does not include per-pixel depth estimation in its protocol; instead, it provides explicit haze transmittance measures and precise geometric metadata for each frame. This enables precise control and analysis of haze-depth coupling across variable altitudes and viewing geometries.

4. Baseline Algorithm Evaluations in Heavy Haze

Standard object detectors have been systematically evaluated on both benchmarks to quantify performance under adverse atmospheric conditions.

HazyDet reports the following [email protected] scores:

FCOS (ResNet-50): Synthetic Test 45.9%, RDDTS (real fog) 22.8%
YOLOX: Synthetic Test 42.3%, RDDTS 24.7%
VFNet: Synthetic Test 51.1%, RDDTS 25.6%
Deformable DETR: Synthetic Test 51.9%, RDDTS 26.5%

For 18 detectors, single-stage models cluster at 35–45% mAP, two-stage at 47–51%, DETR variants at 30–52%. Notably, "real" fog (RDDTS) cuts mAP by approximately half compared to mild synthetic haze, revealing substantial detection degradation in authentic, heavy haze (Feng et al., 2024).

A2I2-Haze evaluates detectors with and without dehazing preprocessing. Notable results (YOLOv5, AP@0.5):

Without dehazing: 53.8 AP
With Cycle-DehazeNet: up to +14.3 AP improvement for heavy haze (t < 0.3), with maximum detector gains (+17 AP) under the densest conditions

Non-homogeneous haze removal models (Trident, SRKT, DWDehaze) outperformed homogeneous-prior methods by roughly 5 AP. The results indicate that dehazing, particularly via GAN-based or non-homogeneous methods, significantly mitigates heavy haze effects on downstream detection (Narayanan et al., 2022).

5. Advanced Architectural Approaches: Depth-Conditioned Detection

HazyDet introduces the Depth-Conditioned Detector (DeCoDet), which augments traditional one-stage detectors using explicit depth cues and dynamic feature modulation:

At each FPN scale, a Multi-scale Depth-aware Detection Head (MDDH) produces both object predictions and a depth map.
Depth-Cue Conditional Kernels (DCK) are dynamically generated for each spatial pixel by transforming the local depth feature $X_{i,j}$ as:

$\mathcal{H}_{i, j} = \phi(X_{i, j}) = W_1 \cdot \sigma(W_0 \cdot X_{i, j})$

where $W_0$ and $W_1$ are learned weight matrices, $\sigma$ is a batch-norm followed by nonlinearity, and K=7, G=16 for kernel size and grouping.

These depth-adaptive kernels convolve over detection features $Y$ , yielding pixel-wise modulated outputs:

$Y'_{i,j,k} = \sum_{(u,v) \in \Delta_K} \mathcal{H}_{i,j,u+\lfloor K/2 \rfloor ,v+\lfloor K/2 \rfloor , \lceil kG/C \rceil} \cdot Y_{i+u, j+v, k}$

The full loss includes standard detection terms plus a Scale-Invariant Refurbishment Loss (SIRLoss) for depth supervision, which utilizes a clean-pseudo-label blend $\hat{y} = \alpha y^* + (1-\alpha) y$ , $\alpha=0.9$ , operating on log-depth.

$L_{SI} = \frac{1}{n} \sum_{i} d_i'^2 - \frac{1}{n^2} (\sum_{i} d_i')^2, \quad d_i' = \log y_i - \log \hat{y}_i$

where $y$ is the predicted depth, $y^*$ is the pseudo-label, and $\hat{y}$ is the refurbished target.

Training uses synthetic data with linear warm-up ("PDFT"), but the protocol does not detail a staged domain adaptation schedule.

6. Empirical Insights and Practical Considerations

DeCoDet yields consistent +1.5% mAP improvements (FCOS base: 45.9% → 47.4%; RDDTS: 22.8% → 24.3%), with ablations confirming incremental gains from MDDH (+0.3/0.5), SIRLoss (+0.5/0.3), and DCK (+1.0/1.3). Depth label noise severely degrades performance: substituting noise for all depth pseudo-labels lowers synthetic mAP from 47.4% to 39.7% and RDDTS mAP from 24.3% to 21.0%. Qualitative results (Grad-CAM) show DeCoDet suppresses false positives, especially for small or distant vehicles in dense fog (Feng et al., 2024).

A2I2-Haze data, by virtue of physical haze measurement and paired haze-free references, enables longitudinal, multi-variable assessment of detection robustness. Under heavy haze, dehazing preprocessing (notably Cycle-DehazeNet and non-homogeneous models) can restore up to +17 AP for certain detectors, illustrating the practical importance of intermediate restoration steps (Narayanan et al., 2022).

7. Protocol Recommendations, Use Cases, and Limitations

Best practices for compiling and employing heavy-haze drone datasets include:

Synchronizing high-rate haze sensors (e.g., transmissometers) with UAV imagery to provide accurate, per-frame haze metadata.
Capturing at varied altitudes and angles to generate a comprehensive haze-depth coupling, essential for both model evaluation and atmospheric calibration.
Pairing hazy and haze-free imagery through precise temporal and geometric alignment (e.g., coarse-to-fine matching and keypoint homography).
Utilizing per-frame transmittance and altitude in algorithm training and evaluation.
Recognizing current limitations: narrow object class diversity, restricted background types, point-based haze measurement (suggesting need for spatial sensor grids), off-line benchmarking (motivating real-time/edge solutions), and static haze fields (future work should include dynamic obscurants).
For HazyDet, the absence of specific UAV model/lens details and the use of synthetic depth inference should be considered when generalizing findings.

The datasets collectively establish rigorous tools for evaluating and advancing vision systems in degrading atmospheric conditions, with direct implications for UAV autonomy, remote sensing, and adverse-weather deployment scenarios.