ReBO: Repairing Buildings in OSM

Updated 29 September 2025

ReBO is a comprehensive framework that employs statistical, optimization, and deep learning methods to systematically repair building annotations in OSM.
It utilizes techniques like Markov Random Field optimization, CNN hypercolumns for probability mapping, and shape priors to address misalignments, withered, and missing annotations.
Empirical benchmarks on rural datasets, such as in Tanzania and Zimbabwe, demonstrate significant improvements in annotation accuracy, enhancing applications in urban planning and disaster response.

Repairing Buildings in OSM (ReBO) refers to a suite of algorithmic methodologies and benchmarks for the detection, correction, and alignment of building annotations in OpenStreetMap (OSM), particularly those derived from remote sensing imagery. OSM building footprints are frequently affected by misalignments, outdated annotations, and incomplete coverage due to the complexities of georeferencing and human annotation. The ReBO initiative encompasses statistical, optimization-based, and learning-based methods for systematically correcting these discrepancies—most notably through approaches such as Markov Random Field optimization, transformation networks, binary integer programming, and iterative denoising models. Recent literature introduces the alignment token paradigm and novel datasets to benchmark and facilitate repair procedures at scale.

1. Taxonomy of Building Annotation Errors in OSM

Three principal categories of defects in OSM building annotations are universally acknowledged:

Geometric misalignment: Building footprints are spatially displaced with respect to updated imagery, often due to inaccuracies in projection or legacy annotation errors.
Withered/misannotations: Footprints in the map that no longer correspond to any structure in the image, typically representing demolished, obfuscated, or erroneously added features.
Missing annotations: Structures present in updated imagery but absent in the map, resulting from construction after last annotation, missing coverage, or human oversight.

Corrective workflows must address all three, often employing a combination of probabilistic modeling and explicit detection strategies (Vargas-Muñoz et al., 2019).

2. Markov Random Field-Based Alignment

A foundational method for rural building annotation repair leverages a Markov Random Field (MRF) to jointly model the spatial alignment of grouped annotations. Each annotation group (often corresponding to spatially proximal buildings sharing a misalignment vector) is treated as a node, and the MRF energy function balances the evidence (via CNN-derived building probability maps) against spatial smoothness constraints.

The energy minimization task is formulated as:

$\hat{d} = \arg\min_{d \in \mathcal{D}^N} \sum_i \left\{ -\log C(d_i(x_i), y_i) + \beta \sum_{j \in N_i} (1/Z) \|d_i - d_j\|_2 \right\}$

where $C(\cdot, \cdot)$ is the normalized correlation between annotation and CNN probability map, $\mathcal{D}$ the discrete candidate shifts, and $\beta$ the spatial regularization parameter. Iterative Conditional Modes (ICM) are used in practice for robust minimization (Vargas-Muñoz et al., 2019).

3. Building Probability Map: CNN Hypercolumns

The approach relies on a fully convolutional neural network (CNN) using hypercolumn features to estimate a per-pixel building probability map. Training is conducted on manually corrected OSM annotations; inference yields evidence maps that inform subsequent alignment, removal, and addition. Probabilities within each footprint guide removal of withered annotations (if average confidence falls below strict threshold) and shape priors for candidate additions (Vargas-Muñoz et al., 2019).

4. Automated Detection and Addition via Shape Priors

After alignment and removal, the challenge of missing annotations is addressed by CNN-driven detection using predefined geometric shape priors. A set of 18 canonical shapes (circles, squares, rectangles, with rotated and scaled variants) is represented as candidate templates. The detection network outputs score maps for each shape, and an Intersection over Union (IoU) threshold (e.g., IoU > 0.75 for positive assignment) guides candidate acceptance.

Candidates are filtered by high thresholding (e.g., $t = 0.80$ on detection/building probability scores) to reject false positives, with overlapping candidates resolved by selecting the highest scoring footprint (Vargas-Muñoz et al., 2019).

5. Benchmarking and Empirical Results

Empirical validation on Tanzanian and Zimbabwean rural datasets shows substantial increase in annotation accuracy over OSM baselines and prior registration/segmentation approaches. Pixel-level metrics indicate F-score improvement from ~0.111 (unaligned) to ~0.657 (aligned and enhanced). MRF group alignment confers performance especially in noise-dominated environments, while removal/addition stages drive significant recall gains (Vargas-Muñoz et al., 2019).

Dataset	Baseline F-score	ReBO (after all stages)
Tanzania	~0.111	~0.657
Zimbabwe	Lower	Substantially improved

6. Integration with Contemporary Repair Algorithms and Benchmarks

Recent research extends OSM annotation repair to larger-scale and urban domains:

DragOSM (Li et al., 22 Sep 2025) introduces an interactive denoising model using alignment tokens—latents encoding the positional correction vector between historical OSM labels and ground-truth roof/footprint. Corrections are modeled as Gaussian perturbations; iterative denoising strategies yield high Macro F1 scores (>91%) on the ReBO benchmark comprising 179,265 buildings from 5,473 images across 41 cities.
MapRepair (Zorzi et al., 2020) employs deep networks to estimate similarity transformations per instance, correcting label noise and misalignments. Temporal inconsistencies are handled via segmentation masks for new/obsolete buildings, using regularization for geometric validity. These approaches support automated updates over large areas and can process severe misalignments (IoU boost from 0.23 to 0.77 in extreme cases).
Interactive Human-in-the-Loop Correction (Vargas-Muñoz et al., 2020) enhances annotation efficiency by guiding OSM volunteers to tiles most likely to be erroneous, according to CNN-derived probability maps and prioritization metrics (e.g., Sum of Absolute Differences). Iterative refinement is supported by retraining after corrections, minimizing manual workload while maximizing data quality.

7. Implications and Applications

Accurate repair and updating of OSM building data enable improved demographic analyses, urban planning, disaster response, and humanitarian action. Automatic correction systems reduce the manual effort required from volunteers and make rural/underserved regions more reliably mappable. Benchmarking datasets such as the ReBO set enable robust evaluation of new techniques, fostering further progress. Optimal repair strategies integrate CNN probability modeling, regularized alignment (MRF or transformer-based), shape-aware detection, and hybrid automation–human verification pipelines.

A plausible implication is that such methodologies are increasingly critical for maintaining map and cadastre fidelity in dynamically changing environments, especially where historical annotations alone fail to capture up-to-date building geometries or states.

In summary, Repairing Buildings in OSM (ReBO) encompasses statistical, deep learning, and shape-based approaches for repairing, aligning, and updating OSM building polygons. By integrating multi-stage repair pipelines, benchmarking against comprehensive datasets (e.g., ReBO), and leveraging both machine automation and human judgment, the field advances toward high-precision, scalable map correction applicable to global urban and rural datasets.