RS-Haze Dataset: Synthetic & Real Benchmarks

Updated 11 May 2026

RS-Haze dataset comprises benchmark corpora for haze removal, offering both synthetic ultra-high-resolution images and real, sensor-diverse scenes.
The 8KDehaze component uses atmospheric scattering models on 10,000 paired images to evaluate dehazing performance, advising patch-based training for efficiency.
The A2I2-Haze variant provides real, controlled imagery with detailed object annotations and measured haze levels to validate dehazing and detection algorithms.

The term “RS-Haze dataset” refers to benchmark corpora for haze removal and analysis in remote sensing, ranging from ultra-high-resolution synthetic image sets to real-world controlled and in situ collections. The two principal usages of the RS-Haze designation in the literature are: (1) the term adopted for the large-scale synthetic “8KDehaze” dataset introduced by Castle Chen et al. (Chen et al., 13 Apr 2025), and (2) the “A2I2-Haze” dataset (with ‘RS-Haze’ in title), a standards-oriented, sensor-diverse, and physically quantified corpus for validating dehazing and detection algorithms (Narayanan et al., 2022). Both are foundational for benchmarking model performance in degraded aerial and terrestrial vision settings but tackle complementary challenges in resolution, realism, and annotation protocols.

1. Dataset Composition and Core Statistics

8KDehaze (RS-Haze, Synthetic, Ultra-High-Resolution)

Size and Structure: 10,000 pairs of clear and synthetic hazy images, each $8192 \times 8192$ pixels (≈67 Mpx/image, $1.34 \times 10^{12}$ pixels total). Data volume spans several terabytes in PNG/TIFF form.
Scene Taxonomy: Six categories (urban, agricultural, mountainous, arid/desert, coastal, riverine/wetland), sourced from USGS public aerial archives. Precise class distributions are not specified.
Haze Levels: Synthetic overlays sample haze densities continuously; operational density categories (light, medium, dense) are referenced, but not explicitly enumerated.
Data Organization: File pairs in “clear” and “hazy” directories, one-to-one mapping by filename. No associated depth or auxiliary ground-truth.
Dataset Partitioning: No canonical training/validation/testing split provided in the release; experimental studies use randomly cropped $2048 \times 2048$ sub-tiles with batch size 2. Users are expected to adopt application-suited splitting (e.g., 80/10/10 or cross-validation by geography).

A2I2-Haze (RS-Haze in Title, Real, Controlled, Annotated)

Size and Structure: 1,033 total images split across UAV (583) and UGV (450) sources. Sub-divided as paired (hazy/haze-free), unpaired, and held-out test images for each platform.
Sensor Modalities: Visible and LWIR (FLIR Boson 640) for UAV; MultisenseSL color camera, stereo + depth, synchronized mono, and IR for UGV.
Acquisition Details: Images collected using pre-scripted flight plans (UAV) or teleoperated ground navigation (UGV), with synchronized metadata (altitude, pose, timestamp).
Haze Diversity: 51 distinct haze trials, uniform sampling of extinction coefficient $\beta \in [0, \beta_{max}]$ ; transmittance $T$ spans from $>99\%$ (clear) to $<1\%$ (dense).
Object Annotation: Ten-classes of ground objects (vehicles, obstacles, mannequins), per-image bounding boxes, and subjective haze level (light/medium/heavy).
Spatial/Temporal Pairing: UAV: frame selection via LoFTR-based keypoint matching and homography deviation; UGV: matching by geolocated pose with minimal orientation and spatial error.

Dataset	Pairs (clear/hazy)	Image Size	Haze Type	Annotation	Split Specified
8KDehaze	10,000	$8192\times8192$	Synthetic	None	No
A2I2-Haze	274 (UAV + UGV)	~1.5K–2K (varied)	Real, Controlled	Object bounding boxes, haze level	Yes

2. Haze Generation and Quantification Protocols

8KDehaze

Model: Classical atmospheric scattering: $I(x) = J(x)\cdot t(x) + A\cdot (1-t(x))$ , $t(x)=\exp(-\beta d(x))$ ( $1.34 \times 10^{12}$ 0=clean image, $1.34 \times 10^{12}$ 1=hazy, $1.34 \times 10^{12}$ 2=airlight, $1.34 \times 10^{12}$ 3=scattering, $1.34 \times 10^{12}$ 4=scene depth).
Implementation: Synthetic haze is generated using the “SatelliteCloudGenerator” tool [Czerkawski et al.], which modulates transmission based on estimated scene depth. Random sampling of $1.34 \times 10^{12}$ 5 and $1.34 \times 10^{12}$ 6 covers the operational haze range with no fixed density stratification.
Clear Images: Sourced from high-resolution USGS aerial archives; depth proxies used where true depth is unavailable.

A2I2-Haze

Physical Generation: Haze generated via six M56E1 fog-oil smoke generators under controlled external conditions at DEVCOM M-Field.
Quantitative Measurement:
- Laser transmissometers (Z-Laser S3 diode, 625 nm, measured $1.34 \times 10^{12}$ 7, $1.34 \times 10^{12}$ 8).
- Contrast-board: B/W pylon boards provide spatially co-sited, local contrast $1.34 \times 10^{12}$ 9 as a haze proxy.
Transmission and Scene Depth: Category and continuous haze stratification based on measured $2048 \times 2048$ 0 and $2048 \times 2048$ 1; 51 haze scenarios yield uniform coverage of the atmospheric extinction continuum.

3. Annotation Schemes and Ground Truth

8KDehaze

Annotation Content: None beyond the haze/clear pairing per scene.
Structure: File-naming is matched by directory (hazy, clear); no depth, segmentation, or bounding box annotations.

A2I2-Haze

Object-Level: 10 annotated classes, 2D bounding boxes, and haze strength per object (each labeled as light/medium/heavy).
Pairing Methodology:
- UAV: “Coarse-to-fine” matching of video frames using LoFTR-based keypoint matching, followed by homography ( $2048 \times 2048$ 2) deviation minimization and manual verification.
- UGV: Hungarian algorithm over pose-and-orientation space minimizes $2048 \times 2048$ 3 for matching hazy and haze-free frames.
Metadata: Barometric altitude, geolocation, and time-stamps at millisecond level; UGV pose in global map via AMCL.

4. Benchmarking Protocols and Reported Baselines

8KDehaze

Intended Use: Training/testing ultra-high-resolution (UHR) dehazing networks; benchmarking global-context fusion vs. local sliding-window approaches.
Metrics: PSNR / SSIM over test patches in $2048 \times 2048$ 4 inference, plus inference time (A100 GPU).
Comparative Benchmarks (Selected $2048 \times 2048$ $2048 \times 2048$ 5 full image results):
- 4KDehazing (slicing): 25.81 dB / 0.9569 / 6.68 s
- 4KDehazing (direct): 20.41 dB / 0.8664 / 1.35 s
- Dehamer: 25.92 dB / 0.9373 / 6.61 s
- C2PNet: 26.17 dB / 0.9669 / 43.27 s
- DehazeFormer-s: 26.68 dB / 0.9729 / 7.47 s
- ConvIR-b: 26.93 dB / 0.9775 / 8.71 s
- DehazeXL (proposed): 32.35 dB / 0.9863 / 4.62 s
Observations: DehazeXL achieves a substantial gain (∼5.4 dB PSNR) over prior methods with efficient memory consumption. Slice-based benchmarks suffer block artifacts and degraded color consistency.

A2I2-Haze

Evaluation Tasks: Object detection (recall, AP@0.5, AP@[0.5:0.95]) and dehazing (as preprocessing for detection, not by aesthetic metrics due to scene misalignment).
Detection Results:
- UAV: CenterNet best ([email protected]: 69.3%), YOLOv5: 53.8%.
- UGV: CenterNet (69.3%), Faster-RCNN (58.0%).
- Altitude metadata (NDFT): Improves UAV results by ∼2–3 AP points.
Dehazing Preprocessing:
- Non-homogeneous dehazing (Trident, DW-GAN, SRKT) consistently outperforms homogeneous methods for detection AP.
- Cycle-DehazeNet (unpaired, based on CycleGAN) achieves largest AP gain: YOLOv5 [email protected] increases from 53.8% → 68.1% on UAV imagery.
- FFA-Net occasionally reduces detection, highlighting a mismatch between perceptual (aesthetic) quality and downstream detection performance.

5. Access, Licensing, Limitations, and Recommendations

Accessibility

8KDehaze: Repository and dataset download at https://github.com/CastleChen339/DehazeXL (Chen et al., 13 Apr 2025). Licensing unspecified in publication (expected: MIT or CC-BY in repository).
A2I2-Haze: Data and object annotations at https://a2i2-archangel.vision (Narayanan et al., 2022).

Practical Constraints and Integration

8KDehaze:
- Patch-based training (random $2048 \times 2048$ 6 crops) advised for GPU feasibility.
- Models should leverage global context at inference (global-attention or patch-token fusion), not naive tiling.
- Storage/I/O is non-trivial: multi-TB corpus necessitates high-throughput solutions and efficient tile access.
- No real-image haze; bridging synthetic–real gap requires new data or domain adaptation.
- No auxiliary channels (depth, multi-spectral).
A2I2-Haze:
- Real scenes with dense, measured haze; limited in scale and object/scene diversity relative to synthetic corpora.
- Complex synchronization and calibration, but direct relevance for detection and decision-making benchmarks.
- Multi-modal and paired (visible/IR), object-level labeling, and quantifiable haze density.

Table: Core Differences and Use Contexts

Dimension	8KDehaze (RS-Haze)	A2I2-Haze (RS-Haze in Title)
Source Type	Synthetic, satellite	Real, controlled field
Image Resolution	$2048 \times 2048$ 7	$2048 \times 2048$ 8 to $2048 \times 2048$ 9
Haze Generation	Simulated via ASM	Physical (fog generators), measured
Annotations	None	Object/scene-level, haze, pairing
Benchmarked Tasks	Dehazing (PSNR/SSIM)	Dehazing for Detection, AP, recall
Accessibility	Public, GitHub	Public, web

Several datasets intersect with or complement RS-Haze, but differ in their balance of realism, annotation, and satellite modality:

RRSHID (Zhu et al., 23 Mar 2025): Real, multi-temporal, 3,053 pairs (hazy/haze-free), $\beta \in [0, \beta_{max}]$ 0 crops, 1m/4m resolution, labeled by dark channel statistics. Primarily for dehazing, not detection; lacks the ultra-high resolution span of 8KDehaze.
General Dehazing Benchmarks: DCP, GridDehazeNet, DehazeFormer et al. are typically evaluated on these, but often lack full-scene context, or are synthesized at lower resolution (Zhu et al., 23 Mar 2025, Chen et al., 13 Apr 2025).
Synthetic vs. Real Gap: As noted in (Chen et al., 13 Apr 2025), simulation-based datasets (like 8KDehaze) enable rigorous architectural benchmarking but require translation to or augmentation by real-world imagery for field deployment.

7. Limitations, Extensions, and Prospects

Synthetic–Real Generalization: 8KDehaze is foundational for UHR benchmarking but does not capture atmospheric heterogeneity, camera–sensor artifacts, or complex occlusion present in real scenes. A2I2-Haze directly measures the transmission/extinction, but at a smaller scale and limited geospatial diversity.
Future Directions (explicit in (Chen et al., 13 Apr 2025, Narayanan et al., 2022)):
- Scale synthetic protocols to multi-sensor, multi-temporal, and multi-band settings.
- Develop efficient memory architectures for global context in UHR imagery (to circumvent $\beta \in [0, \beta_{max}]$ 1 attention).
- Augment with real scene depth or atmospheric data to extend utility for physics-driven atmospheric correction.
- Establish new benchmarks for aesthetic vs. task-driven (e.g., detection) impact of dehazing, given evidence of weak correlation between PSNR and AP (Narayanan et al., 2022).

In summary, “RS-Haze dataset” denotes both a class of haze benchmarking corpora and two canonical datasets at opposite ends of the realism-scale–resolution tradeoff: 8KDehaze for ultra-high-resolution, globally contextual deep learning, and A2I2-Haze for physically measured, annotated, and task-driven real scene analysis in degraded visual environments. Their complementary roles have established the empirical and methodological foundation for algorithm development in remote sensing dehazing and robust detection under haze.

Markdown Report Issue Upgrade to Chat

References (3)

Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images (2025)

A Multi-purpose Realistic Haze Benchmark with Quantifiable Haze Levels and Ground Truth (2022)

Real-World Remote Sensing Image Dehazing: Benchmark and Baseline (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RS-Haze Dataset.

RS-Haze Dataset: Synthetic & Real Benchmarks

1. Dataset Composition and Core Statistics

8KDehaze (RS-Haze, Synthetic, Ultra-High-Resolution)

A2I2-Haze (RS-Haze in Title, Real, Controlled, Annotated)

2. Haze Generation and Quantification Protocols

8KDehaze

A2I2-Haze

3. Annotation Schemes and Ground Truth

8KDehaze

A2I2-Haze

4. Benchmarking Protocols and Reported Baselines

8KDehaze

A2I2-Haze

5. Access, Licensing, Limitations, and Recommendations

Accessibility

Practical Constraints and Integration

Table: Core Differences and Use Contexts

6. Contextualization and Related Datasets

7. Limitations, Extensions, and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics