Bihar Flooded Croplands Dataset (BFCD-22)
- BFCD-22 is a curated satellite dataset that employs pixel-wise NDVI differencing to classify flood-induced crop damage with high accuracy.
- The dataset integrates rigorous preprocessing steps such as cloud removal, geometric co-registration, and segmented patch extraction to ensure high-quality, analysis-ready imagery.
- It supports advanced techniques like EDSR super-resolution, improving full damage F1-scores from 0.83 to 0.89, and underpins robust flood damage segmentation pipelines.
The Bihar Flood-Impacted Croplands Dataset (BFCD-22) is a curated, high-fidelity satellite-derived corpus specifically designed to support automated assessment of agricultural flood damage in the context of the October 2022 flood in the Muzaffarpur district, Bihar, India. Created to address the limitations of manual field surveys and coarse-resolution satellite methods, BFCD-22 enables supervised training and quantitative evaluation for fine-grained crop damage classification. The dataset is distinguished by its direct alignment with high-resolution commercial reference imagery, pixel-wise NDVI-based labeling protocol, and comprehensive ancillary metadata, forming the empirical backbone of the FLNet super-resolution and damage segmentation pipeline (&&&0&&&).
1. Spatial, Temporal, and Sensorial Coverage
BFCD-22 comprises paired pre- and post-flood satellite acquisitions over the Muzaffarpur agricultural region (approximately 26°07′ N, 85°24′ E), fully capturing the extent of flood-induced cropland transformation in October 2022. Sentinel-2 Level-2A imagery—Red (band 4) and NIR (band 8), 10 m spatial resolution—serves as the primary input, while PlanetScope (RGB + NIR, 3 m) operates as high-resolution pseudo-ground truth for both super-resolution targets and reference labeling. All input scenes undergo rigorous geometric orthorectification and co-registration to a unified 3 m spatial grid, producing a set of ~1,000+ aligned 256×256 pixel chips (≈768×768 m) restricted strictly to cropland areas via a 10 m land-cover mask. Built-in cloud and quality masks (Sentinel-2 L2A QA and PlanetScope UDM2) ensure exclusion of cloud/shadow artifacts and direct the construction of high-quality, analysis-ready samples.
2. Annotation Protocol and Class Definitions
Crop damage quantification in BFCD-22 is derived from pixel-wise NDVI change computed between pre- and post-event acquisitions:
Three discrete classes (encoded as 0, 1, 2) segment the cropland based on ΔNDVI magnitude:
- 0 (“No Damage”): ΔNDVI < T₁
- 1 (“Partial Damage”): T₁ ≤ ΔNDVI < T₂
- 2 (“Full Damage”): ΔNDVI ≥ T₂
Thresholds T₁ and T₂ are derived via histogram analysis of ΔNDVI from co-registered PlanetScope reference, anchoring class boundaries in empirical image statistics. Annotation proceeds through generation of super-resolved NDVI maps, threshold-based initial segmentation, followed by morphological smoothing to eliminate isolated pixel noise. Class proportions exhibit marked imbalance: approximately 50% “No Damage,” 30% “Partial Damage,” and 20% “Full Damage.” No in-field survey was conducted; instead, the high-resolution PlanetScope data is accepted as proxy ground truth for both change detection and labeling validation—subject to manual inspection for co-registration and masking fidelity.
3. Data Preprocessing, Patch Extraction, and Dataset Splitting
Dataset preparation integrates several mandatory preprocessing stages. Sentinel-2 and PlanetScope scenes are subject to:
- Cloud and shadow removal using standard QA bands (Sentinel-2 L2A, PlanetScope UDM2)
- Radiometric calibration leveraging L2A (for Sentinel) and UDM2 (for PlanetScope) reflectance products
- Geodetic orthorectification with sub-pixel tie-point refinement for rigid alignment
- Cropland masking based on 10 m land-cover data
Patch extraction partitions the study area into non-overlapping 256×256 pixel chips at 3 m, ensuring each chip contains both Sentinel-2-derived NDVI (input, after downsampling for SR pre-training) and directly mapped PlanetScope NDVI (target/reference). Dataset splits are stratified both by spatial location and, for segmentation, by damage class distribution:
- Super-resolution (EDSR) training/validation: 80%/20%
- Damage segmentation (UNet) training/validation/test: 70%/15%/15%
This stratification supports both model learning and robust generalization assessment for regional or imbalance-driven variance.
4. Super-Resolution Framework and Input Construction
The Enhanced Deep Super-Resolution (EDSR) network is adopted for upscaling Sentinel-2 inputs from 10 m to the 3 m PlanetScope grid. Model configuration includes 16 residual blocks and 64-channel feature representations, effecting a ∼3.33× upscaling. Mathematically, let denote the SR mapping, targeting minimization of the loss: where is the Sentinel-2 NDVI input and the aligned PlanetScope NDVI target. Training employs the Adam optimizer (learning rate , batch size 8), with early stopping based on validation PSNR. All extractions are non-overlapping, maximizing spatial diversity over agricultural land parcels. A plausible implication is that this SR protocol makes the dataset agnostic to commercial data for inference, by enabling PlanetScope-comparable detail from public imagery.
5. Quantitative Performance and Evaluation Metrics
Assessment of super-resolution and damage classification efficacy on BFCD-22 employs established metrics:
- Super-resolution: Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM) [cf. Wang et al. 2004], reporting PSNR (pre-flood: 21.10 dB; post-flood: 20.77 dB) and SSIM (pre-flood: 0.860; post-flood: 0.748) between predicted SR output and PlanetScope ground truth.
- Classification: Per-class Precision, Recall, and F1-score computed as follows:
Damage classification (Table 2 of (Ghosal et al., 7 Jan 2026)):
| Source | No Damage F1 | Partial Damage F1 | Full Damage F1 |
|---|---|---|---|
| Sentinel-2 (10 m) | 1.00 | 0.98 | 0.83 |
| PlanetScope (3 m) | 0.98 | 0.90 | 0.89 |
| EDSR Super-Resolved (3 m) | 0.99 | 0.96 | 0.89 |
The “Full Damage” F1-score rises from 0.83 (native 10 m) to 0.89 post-super-resolution, matching performance based on commercial high-resolution imagery.
6. Access, Applicability, and Limitations
BFCD-22 is distributed by the original authors upon request, with associated masking, co-registration, super-resolution, and segmentation codebase available under an academic license. The dataset’s primary constraint is its geographical and temporal specificity: single district, single flood event, and the absence of direct in-field validation. Labeling by ΔNDVI thresholds can conflate phenological variation (e.g., harvest timing) with true flood damage, and residual artifacts—from unmasked clouds/shadows or minimal misalignments—can induce spatially correlated false positives. Recommended practices for transfer to new events include recomputing ΔNDVI thresholds according to local crop phenology, rigorous manual inspection of scene registration, intensive cloud/shadow removal, and, where optical coverage is degraded, complementary use of Sentinel-1 SAR. Validation against even limited field data is advised for robust deployment.
A plausible implication is that, while BFCD-22 enables scalable analysis within its design envelope, broader generalization—both spatially and across multiple flood seasons—would require augmentation with multi-site, temporally diverse ground truth data, and possibly multi-modal (SAR/optical) fusion pipelines. However, the technique described demonstrates that super-resolution of public Sentinel-2 imagery, when integrated with principled NDVI differencing and robust QC, can achieve damage assessment fidelity previously restricted to costlier commercial resources (Ghosal et al., 7 Jan 2026).