3D Building Dataset Construction

Updated 26 April 2026

3D building dataset construction is the systematic creation of geometric and semantic building representations via automated, semi-automated, and annotated workflows.
It integrates procedural-synthetic pipelines and real-world multimodal fusion to generate high-fidelity, richly annotated datasets for research and simulation.
Robust quality assurance, benchmarking protocols, and detailed metadata schemas ensure reproducibility and practical applicability in AI, urban planning, and robotics.

Three-dimensional (3D) building dataset construction encompasses a complex spectrum of automated, semi-automated, and annotated workflows designed to generate large corpora of geometric, semantic, and multimodal representations of buildings for AI research, simulation, urban planning, and robotics. This article systematically reviews the methodological foundations, dataset paradigms, and technical evaluation standards in large-scale 3D building dataset creation, referencing archetypal procedural-synthetic and real-world pipelines, representation conventions (B-Rep, mesh, point cloud, wireframe, voxel, etc.), quality assurance strategies, and current trends in metadata modeling and benchmarking.

1. Procedural and Shape-Grammar Synthesis

Procedural pipelines leveraging shape grammars or parametric generation form the backbone of many scalable 3D building datasets, enabling architectural regularity, variety, and fine-grained annotation at low manual cost.

In "BuildingBRep-11K" (Guo et al., 3 Jun 2025), a shape-grammar-driven system generates 11,978 geometrically exact, multi-storey (2–10 storey) B-Rep building solids. Each floor plan is initialized as a “core-tube” rectangle and iteratively grown by stochastic selection of concave (yin) or convex (yang) expansions at polyline vertices, according to

$P \to P \cup R_{\text{yin}}(v) \qquad \text{or} \qquad P \to P \cup R_{\text{yang}}(v)$

with parallelogram modules spatially sampled conditional on edge lengths. Stopping criteria include module and collision limits; a fixed seed guarantees reproducibility. Floor-plan polylines are offset to 3D solids with precise wall thickness, vertical extrusion (storey-height sampled from Uniform[2.7 m, 3.3 m]), and architectural rule-based placement of doors (to enforce full room connectivity), windows (north/east/south/west with daylight maximization for south), and corridor clearance.

The SYNBUILD-3D dataset (Mayer et al., 28 Aug 2025) automates LoD 4-compliant buildings using a hierarchical pipeline: (a) exterior hulls from Random3DCity; (b) AI-conditional floor plan generator (RPLAN), vectorization, and alignment/min-coverage optimization; (c) extruded interiors with merged boundary nodes and explicit room, window, door subgraphs; and (d) dense roof point clouds. Final building instances enforce uniqueness and semantic consistency via rule-based quality-control: minimum three rooms per floor, complete coverage, verified room-door mapping, structural node merging, and additional manual validation (element-wise accuracy: walls 99.95%, windows 100%). This results in millions of high-fidelity, multi-floor annotated instances.

These methods enable precise modeling of domain constraints such as spatial scale (room size distribution, corridor width, daylight angle), architectural heuristics (aspect-ratio filters, connectivity graphs), and semantic richness (per-room labeling, hierarchical metadata), ensuring generated datasets meet research, simulation, and ML requirements.

Acquiring data from physical buildings at urban or national scales requires multimodal sensor integration, geospatial harmonization, and scalable annotation.

The P $^3$ dataset (Sulzer et al., 21 May 2025) exemplifies state-of-the-art multimodal acquisition, combining dense airborne LiDAR (6–21 pts/m², σ_z ≈ 0.1 m accuracy) and orthorectified RGB aerial imagery (GSD 25 cm) over ∼638 km² and multiple continents. Preprocessing includes outlier/noise filtering, ground vs. object returns classification, georeferencing, and tile alignment to ensure consistent pixel/point co-location. Polygons are derived from cadastral maps or from data-driven extraction (segmentation-contour pipelines, α-shape reconstructions), harmonized into non-overlapping, hole-aware polygon tiling with boundary splits and sliver-dropping.

CMAB (Zhang et al., 2024) demonstrates national-scale integration, fusing 20 TB of 0.3–1 m Google Earth tiles, 60 million street view images, administrative vectors, POIs, and impervious-surface time series. The pipeline applies deep rooftop segmentation (HRNet/OCRNet, F1 = 89.93%), area/orientation estimation, and ensemble XGBoost for height, volume, function, quality, and temporal inference, validated against manual audits (F1 > 80% for function, age, quality).

UrbanBIS (Yang et al., 2023) leverages high-overlap aerial photogrammetry (113,346 images, 2.5 billion 3D points, 3,370 buildings), with rigorous flight planning and structure-from-motion/dense multi-view stereo to produce fine-grained, instance-labeled point clouds. Annotation integrates semantic tagging via interactive 3D mesh labeling, dense sampling (80 pts/m²), and multi-level instance/fine-category splits.

These real-world workflows deliver coverage, density, and geographic diversity, but must rigorously address sensor non-uniformity, occlusion, resolution variance, temporal mismatch, and annotation noise via dedicated QC and data harmonization protocols.

3. Representation Types and Metadata Schema

Dataset utility is fundamentally shaped by the choice of geometric and semantic representation, annotation detail, and accompanying metadata.

Boundary-Representation (B-Rep): Used for precise, CAD-interoperable solids in BuildingBRep-11K (Guo et al., 3 Jun 2025), with detailed hierarchical face modeling (floors, walls, slabs, rule-based openings), watertightness validation, and per-building metadata (NumPy arrays: storey, per-floor room counts/areas; JSON index mapping).
Wireframe/Semantically Enriched Graphs: SYNBUILD-3D (Mayer et al., 28 Aug 2025) encodes LoD 4 wireframes (nodes/edges, explicit room/door/window sets, per-floor units), with adjacency matrices, unique node mapping, and semantic type dictionaries.
Meshes and Textured Surfaces: Texture2LoD3 (Tang et al., 7 Apr 2025) constructs LoD 3 meshes by projecting and simplifying LoD 1/2 priors, guided by street-level imagery and facade plane fitting (local PCA), with rectified, transformer-segmented textures.
Point Clouds: UrbanBIS (Yang et al., 2023), SIP (Kim et al., 9 Dec 2025), and P $^3$ (Sulzer et al., 21 May 2025) provide dense (often colorized) 3D point sets annotated by semantic class, instance, or building/room affiliation, with standard LAS/LAZ storage.
Voxel or Patch-based (for deep learning): Some synthetic pipelines (e.g., Synthetic 3D Data Generation (Fedorova et al., 2021)) and multimodal fusion methods structure data at raster or patch level for neural representations.

File organization aligns geometry with metadata (index mapping, per-building arrays or sidecar JSON) and is designed for efficient batch loading, targeted ML supervision, and custom downstream task definition.

4. Quality Assurance, Filtering, and Statistical Sanity

Robust dataset construction incorporates multilayered validation and filtering at both geometric and semantic levels:

Boolean and topological constraints: BuildingBRep-11K (Guo et al., 3 Jun 2025) employs manifoldness checks (every half-edge borders exactly two faces), room-area and aspect-ratio filters (A_j≥8 m², α_j ≤ 4), and discards models failing watertightness or architectural norms.
Annotation harmonization: P $^3$ (Sulzer et al., 21 May 2025) applies tile-aligned vector harmonization, polygon splitting, sliver elimination, and explicit handling of holes and boundary conditions. UrbanBIS tracks dual annotator verification and expert reconciliation.
Error metrics and empirical validation: Datasets report F1, mIoU, RMSE, R², and related domain metrics (Roof F1 CMAB: 89.93%, Volume RMSE: 7.6 m, Average manual validation agreement >80% CMAB, Ground-truth coverage >95% SYNBUILD-3D).
Statistical/distributional sanity: All major efforts maintain dataset-scale summary statistics—histograms of room/floor areas, storey and category distributions, graph/node/edge size, and error quantification (false positive/negative analyses).

These processes are critical for ensuring that synthetic corpora do not degrade model generalizability, and that real-world datasets carry reliable semantics and architectural plausibility.

5. Benchmarking Protocols and Model Integration

Standardized evaluation metrics and neural benchmarking architectures are integral to dataset utility and validation:

Multi-attribute regression: BuildingBRep-11K (Guo et al., 3 Jun 2025) uses PointNet for simultaneously regressing key geometric attributes from point clouds (storey count, total/mean room area; MAE = 0.37 floors, 5.7 rooms, 3.2 m² area).
Defect detection: Class-balanced classifier yielding accuracy, precision, recall, and F1 (e.g., TP = 41, FN = 9 in BuildingBRep-11K).
Segmentation/instance benchmarks: UrbanBIS and SIP evaluate semantic mIoU, per-class IoU, instance AP@t, and execution time (UrbanBIS B-Seg achieving AP=0.453, AP@50=0.550, mIoU up to 0.988 for residential).
Polygon and wireframe reconstruction: P $^3$ and 3D Line Cloud datasets report Chamfer and Hausdorff distances, wireframe edit distance, vertex/edge AP, maximum tangent-angle error.
Downstream task readiness: Datasets are explicitly mounted for attribute regression, semantic/instance segmentation, part labeling, style transfer, and as simulation resource for navigation, energy and solar modeling.

Evaluation splits are stratified (by city, region, category, graph size), support cross-validation, and define per-task baselines (standard splits: e.g., 80:10:10 in BuildingBRep-11K, 70:15:15 in BuildingWorld).

6. Trends in Scalability, Diversity, and Licensing

Scalability and coverage, achieved through procedural, synthetic, and crowd-sourced annotation, enable new research and benchmark regimes:

Geographic and architectural diversity: BuildingWorld (Huang et al., 9 Nov 2025) integrates 5 million LoD2 models across 44 cities and all inhabited continents, stratified by architectural style and urban form.
Semantically rich and LoD-compliant models: SYNBUILD-3D raises the bar for LoD 4 compliance (complete exteriors/interiors, explicit room/door/window correspondence, multi-level semantic labeling).
Open-access distribution and reproducibility: All major datasets are released with open/public (often CC-BY-NC or open-academic) licensing and reference code for parsing, benchmark execution, and simulated data generation.
Simulation-ready pipelines: Synthetic datasets expose parametric configuration, metadata-driven stratification, and multi-modal output (mesh, wireframe, RGB-D, point cloud), supporting integration across urban simulation, digital twinning, and large vision models.

The landscape thus comprises highly scalable, richly parameterized procedural syntheses and multimodal real-world corpora with rigorous architectural, spatial, and visual diversity—fueling advancement in geometric deep learning, 3D AI, and urban informatics research.