xBD-S12 Dataset for Rapid Damage Mapping
- xBD-S12 is a dataset designed for rapid building damage assessment using spatially and temporally aligned Sentinel-1 SAR and Sentinel-2 multispectral imagery.
- It comprises 10,315 bi-temporal patches covering 16 disaster events, enabling large-scale mapping with pixel-level damage annotations.
- The dataset supports benchmarking with various deep learning models, highlighting practical gains in rapid disaster response mapping.
xBD-S12 is a dataset designed for rapid building damage assessment in the aftermath of natural disasters, providing spatially and temporally aligned pre- and post-disaster Sentinel-1 SAR and Sentinel-2 multispectral imagery. Anchored to the very-high-resolution xBD benchmark, xBD-S12 enables the development and evaluation of models using widely available medium-resolution Copernicus imagery for large-scale damage mapping tasks.
1. Dataset Composition and Characteristics
xBD-S12 pairs each 0.5 m pre-/post-disaster VHR image from xBD with co-registered medium-resolution Sentinel-1 and Sentinel-2 observations, resulting in 10,315 bi-temporal patches. The dataset covers 16 disaster events across five hazard types: earthquakes, floods, storms, volcanic activity, and wildfires. Three tornado events were omitted as they predate the launch of the Sentinel satellites.
Sensor breakdown:
- Sentinel-2 optical: 10,315 pre-/post pairs, each with 12 multispectral bands (B1–B12).
- Sentinel-1 SAR: 10,315 pre-/post pairs of dual-polarization (VV, VH) log-amplitude GRD products.
Spatial resolution:
- Sentinel-2: Native L2A bands include 10 m (B2, B3, B4, B8), 20 m (B5, B6, B7, B8A, B11, B12), and 60 m (B1, B9) GSD.
- Sentinel-1: 10 m ground range.
- All patches are resampled (Lanczos) to px for deep learning input, yielding an effective GSD of m.
Geographic and temporal alignment:
- Patches correspond to tiles (matching xBD), with affine corrections for georeferencing shift.
- Pre-/post-event acquisitions minimize cloud cover (Sentinel-2) and maximize damage visibility. Sentinel-1 acquisitions are from the closest temporally and orbitally co-aligned GRD scenes.
2. Data Preparation and Processing Pipeline
All Sentinel tiles are acquired using Google Earth Engine (GEE) standard workflows.
Radiometric and atmospheric corrections:
- Sentinel-2: Provided as bottom-of-atmosphere reflectances, processed (orthorectified and cloud-screened) with Sen2Cor.
- Sentinel-1: Includes thermal-noise removal, radiometric calibration ( amplitude), and terrain correction via GEE routines.
Geometric alignment:
- Systematic georeferencing errors in xBD are corrected by fitting an affine transformation to control points derived from building polygons.
- Corrected VHR footprints define Sentinel patch extraction boundaries; both pre- and post-disaster images are resampled and pixel-aligned to grids.
Data and annotation formats:
- Sentinel-2: Downloaded as .SAFE directories and per-band GeoTIFFs.
- Sentinel-1: GRD products as 10 m GeoTIFFs with two polarization channels.
- Annotations: Polygon shapefiles (.shp) for post-event building footprints with damage attributes; segmentation masks for loss and evaluation.
- Metadata: GEE conventions (cloud_score, orbit_direction, acquisition_date) and xBD tile-IDs.
3. Annotation Schema and Label Set
xBD-S12 inherits original building footprint and damage labels from xBD but simplifies them for medium-resolution applications:
xBD label structure:
- Four levels: 0 = no damage; 1 = minor damage; 2 = major damage; 3 = destroyed.
- 425,000+ annotated building polygons across 11,034 VHR image pairs.
xBD-S12 schema:
- Damage levels {minor, major, destroyed} are merged into a single “damaged” class.
- Per-pixel mask classes: 0 = background, 1 = intact building, 2 = damaged building.
- Pixels outside VHR coverage or under clouds are labelled as invalid (ignored during training/evaluation).
Patch storage:
- Four image tensors: S1_pre (2×128×128), S1_post (2×128×128), S2_pre (12×128×128), S2_post (12×128×128)
- One annotation mask and one .shp polygon file with attributes per patch.
4. Evaluation Metrics and Loss Formulation
Damage mapping is formalized as multiclass semantic segmentation. For each pixel , let the prediction be and the ground truth with label set .
Class-wise metrics:
Aggregate metrics:
- Damage F1 (): Harmonic mean of F1 for ‘intact’ and ‘damaged’ buildings, restricted to building pixels.
- Localization F1 (): Binary F1 for building vs. background over all pixels; evaluated with (“B=3”) and without a 3-pixel morphological buffer.
- Competition score:
- The loss function is standard pixel-wise cross-entropy, applied only to valid (non-invalid) pixels.
5. Benchmarking and Experimental Results
A range of models were benchmarked on xBD-S12, including a simple U-Net (in several fusion configurations), four advanced SOTA architectures, and two frozen geospatial foundation models (GeoFMs: Prithvi, DOFA). All models used AdamW (learning rate , weight decay , cosine decay), trained for 40 epochs and ensembled over three runs (logit averaging).
Key results (ensemble of 3, summarized):
| Model | xView2 F1_comp (B=3) | Event-based F1_comp (B=3) |
|---|---|---|
| ChangeMamba | 0.800 | 0.655 |
| U-Net (2-step, late fusion) | 0.753 | 0.709 |
| U-Net (2-step, early fusion) | 0.756 | 0.693 |
| U-Net (joint, early fusion) | 0.764 | 0.687 |
| StrongBaseline | 0.760 | 0.690 |
| DisasterAdaptiveNet | 0.734 | 0.636 |
| ChangeOS | 0.718 | 0.589 |
Additional highlights:
- In the xView2 split (matched-distribution), ChangeMamba outperformed all models, likely leveraging event-specific statistical cues.
- In the event-based split (holds out entire disasters), the two-step U-Net with late fusion generalized best (F1_comp = 0.709), outperforming heavier architectures by up to 5 percentage points.
- The use of both pre- and post-event images conferred a +0.10 gain in localization F1 on the event-based split.
- Applying a 3-pixel morphological buffer improved the recall of building instances for all models.
- Geospatial foundation models (GeoFMs) underperformed the bespoke U-Net on both splits and struggled except in the wildfires scenario, where spectral features were most distinctive.
Per-event performance analysis indicated that floods and wildfires reached event-based F1_comp values of at least 0.70, while sparsely damaged earthquake and volcano events proved challenging at the 10 m resolution available.
6. Practical Implications and Availability
xBD-S12 demonstrates that medium-resolution Sentinel-1/2 imagery—readily obtainable through open-access Copernicus channels—enables the creation of coarse-grained but wide-area building damage assessments within hours post-event. While this level of spatial detail does not substitute for VHR-derived products, it serves as a critical supplementary resource, particularly where VHR coverage is unavailable or delayed. A plausible implication is that Sentinel-based methods may dramatically reduce the time-to-map for humanitarian response over large disaster zones.
Release artifacts, including all Sentinel-aligned data, pretrained models, and code, are publicly available at https://github.com/olidietrich/xbd-s12, supporting reproducibility and further research in remote-sensing-based disaster assessment.
7. Relationship to Related Work and Limitations
xBD-S12 directly extends the established xBD benchmark, grounding its relevance in a lineage of disaster mapping tasks predicated on VHR imagery. Its unique contribution is the rigorous spatiotemporal alignment with complementary Sentinel-1 SAR and Sentinel-2 multispectral sources. Experiments reveal that architectural sophistication, including large vision transformers and foundation models, confers little advantage when generalizing to unseen disasters at medium resolution—a finding that prompts reevaluation of model selection for wide-area, rapid-response mapping.
A principal limitation highlighted by experimental results is reduced efficacy in low-resolution contexts when faced with highly localized or visually subtle damage (e.g., earthquake impact, volcano boundaries). This suggests the enduring necessity of VHR imagery for fine-scale damage inference, and that Copernicus data, while valuable for rapid triage, constitute a first-look rather than definitive mapping solution.