GHS-POP Framework: Historical Population Mapping
- GHS-POP Framework is a global gridded population mapping methodology that allocates population counts using remote sensing-derived built-up masks and detailed census data.
- It employs a GEOBIA segmentation workflow with rigorous preprocessing, including geometric correction and radiometric normalization, to ensure high spatial fidelity.
- The integration of declassified KH-9 imagery via the HexaLCSeg dataset enables high-resolution historical reconstructions for data-scarce peri-urban and rural areas.
The GHS-POP framework refers to a global gridded population mapping methodology, in which the spatial distribution of population counts is estimated and allocated to grid cells using remote sensing-derived built-up masks and auxiliary data. Recent advancements have significantly enhanced the framework’s historical reconstruction capabilities via integration of declassified Hexagon KH-9 reconnaissance imagery and detailed, settlement-level census data. Through the introduction of the HexaLCSeg dataset, GHS-POP workflows can now produce high-resolution, historically accurate population grids for previously data-scarce peri-urban and rural contexts, as demonstrated in northern Istanbul for the period 1975–1990 (Gerrits et al., 14 Dec 2025).
1. Data Sources and Preprocessing
The refinement of the GHS-POP framework heavily relies on the integration of reconnaissance satellite imagery and rigorous preprocessing pipelines.
- Imagery Source: The HexaLCSeg dataset utilizes US National Reconnaissance Office (NRO) KH-9 "Big Bird" panchromatic film-based images acquired in 1977, with a ground sampling distance of approximately 0.6–1.2 m per pixel, covering Arnavutköy and Çekmeköy districts of Istanbul.
- Preprocessing Steps:
- Geometric Correction: Automated tie-point detection aligns KH-9 frames with modern basemaps, supplemented by manual ground control points (GCPs; e.g., road intersections, river bends) and rubber-sheeting adjustments for local distortion minimization.
- Radiometric Normalization: Histogram matching is applied between adjacent frames to ensure consistent illumination.
- Mosaicking and Clipping: Frames are mosaicked into a seamless raster and clipped to the World Mollweide equal-area projection (EPSG:54009).
These preprocessing protocols ensure spatial and radiometric fidelity, enabling the robust extraction of built-up land cover at sub-meter precision (Gerrits et al., 14 Dec 2025).
2. Segmentation and Semantic Classification Methodology
The core of HexaLCSeg’s contribution is its GEOBIA (Geographic Object-Based Image Analysis) workflow, implemented in Trimble eCognition, which translates legacy film imagery into meaningful land cover masks suitable for population allocation.
- Multi-resolution Segmentation is executed with parameters (scale=10, shape=0.3, compactness=0.5), generating image objects whose boundaries correspond to spectral and textural patterns.
- Feature Extraction per object includes:
- Spectral: Mean brightness ().
- Textural: Grey Level Co-occurrence Matrix (GLCM) metrics, specifically contrast: .
- Morphological Filtering: Speckle removed via opening/closing operations.
- Rule-based Classification: An object is assigned a built-up label if:
with thresholds (DN), (GLCM contrast), .
- Training and Validation: The classifier is trained using approximately 200 manually labeled objects spanning six classes (e.g., built-up, cropland, shrub) with a 70/30 train/test split (Gerrits et al., 14 Dec 2025).
3. Dataset Characteristics and Output Schema
The HexaLCSeg product provides a semantically segmented, high-resolution built-up mask and associated vector layers designed for seamless integration with GHS-POP’s Pop2Grid workflow.
| Data Layer | Resolution | Schema and Format |
|---|---|---|
| Raster (GeoTIFF) | 100 m × 100 m | Value: 1(built-up), 0(non), UInt8, EPSG:54009 |
| Vector (Shapefile) | sub-meter polygons | Class (str), Confidence (float 0–1), EPSG:54009 |
- The raster mask aligns with GHS-POP Pop2Grid inputs, representing built-up status per cell.
- The vector data preserves sub-meter object boundaries and includes class and confidence attributes (membership score from classification function).
- 'NoData' is assigned outside the delineated study area.
This structured schema facilitates both grid-based and object-based population allocation approaches (Gerrits et al., 14 Dec 2025).
4. Accuracy Assessment
Quantitative validation is performed via stratified random sampling (500 points) across built-up and non-built-up strata.
- References for Validation: Manual digitization from 1:25 000 USGS topographic maps (1977) and comparison to high-resolution contemporary orthoimagery.
- Reported Metrics:
- Precision: 0.89
- Recall: 0.87
- F₁-score: 0.88
- Overall accuracy: 0.90
The accuracy assessment uses standard formulations:
These metrics indicate robust performance for the task of built-up area segmentation in historical contexts (Gerrits et al., 14 Dec 2025).
5. Integration with GHS-POP Population Disaggregation
The enhanced GHS-POP workflow incorporates HexaLCSeg in the population allocation (dasymetric mapping) chain:
- Baseline (Standard GHSL): Uses Landsat-derived built-up masks for Pop2Grid.
- Hexagon-enhanced: Replaces the Landsat mask with HexaLCSeg, allocating nonzero weight only to KH-9-derived built-up cells.
Population allocation per cell is performed via:
where is the zone total, is 1 if built-up (0 otherwise).
- Fully Integrated Variant (“Hexagon + local census”): Incorporates local settlement-level (LAU-2) census counts (), apportioning population to built-up objects within settlements:
This delivers fine-grained, temporally accurate population grids that more precisely reflect historical rural and peri-urban settlement distributions. A plausible implication is improved modeling accuracy in data-scarce regions and periods where only historical reconnaissance imagery and sparse census records are available (Gerrits et al., 14 Dec 2025).
6. Coverage, Scalability, and Access
- Spatial & Temporal Coverage: Current demonstrations apply to Arnavutköy (western) and Çekmeköy (eastern) districts of Istanbul using 1977 imagery. The KH-9 Declass 3 archive (1971–1986) provides nearly global coverage—excluding parts of Canada, Greenland, Australia, and Antarctica.
- Scalability: The methodology is extensible to other regions and epochs for which KH-9 frames are available, enabling replication at continental or global scales.
- Availability: All HexaLCSeg raster and vector products, alongside preprocessing scripts and documentation, are distributed under CC BY 4.0 via GitHub (https://github.com/pjgerrits/hexagon_grid_historical_pop.git). Original KH-9 frames are accessible without charge through USGS EarthExplorer.
By leveraging rigorous GEOBIA segmentation and dasymetric mapping, the revised GHS-POP framework with HexaLCSeg provides one of the first globally scalable, high-resolution built-up datasets for the 1970s–1980s, substantially advancing the reconstruction of historical population patterns in otherwise data-limited contexts (Gerrits et al., 14 Dec 2025).