Giblayout and ISPRS Datasets Overview

Updated 26 September 2025

Giblayout is a benchmark dataset for indoor floor plan reconstruction from LiDAR point clouds, emphasizing geometric fidelity and complex topology.
ISPRS datasets provide high-resolution 2D and 3D benchmarks for semantic segmentation and urban analysis using multi-modal imagery and DSM data.
Both datasets drive innovations in spatial analysis and multimodal fusion, underpinning advances in SLAM, BIM, and autonomous navigation.

Giblayout and ISPRS Datasets provide essential benchmarks for the development, evaluation, and comparison of algorithms addressing geometric and semantic understanding in both indoor and aerial environments. Giblayout is focused on floor plan reconstruction from point cloud data of indoor buildings, while ISPRS datasets encompass both 2D semantic labeling from aerial images and 3D point cloud segmentation for urban scenes. Both play pivotal roles in their respective research domains, enabling advances in spatial analysis, autonomous navigation, and semantic mapping.

1. Dataset Composition and Characteristics

Giblayout

Giblayout is comprised of LiDAR point cloud scans from 44 diverse house models, exceeding 10,000 m² in total area. The dataset includes complex floor plans such as circular layouts, corridors, and non-Manhattan topologies. Scenes typically provide high-density 3D point clouds suitable for geometric reasoning. In recent studies, a representative subset of 12 scenes has been used for rigorous method evaluation. The dataset targets room-level segmentation and topological recovery, making it suitable for research in indoor mapping, Simultaneous Localization and Mapping (SLAM), and Building Information Modeling (BIM) (Ye et al., 19 Sep 2025).

ISPRS Datasets

The ISPRS datasets are an umbrella for several prominent remote sensing benchmarks:

ISPRS Vaihingen (2D/3D): Contains high-resolution IRRG (infrared, red, green) orthophotos and airborne laser scanning (ALS) point clouds of Vaihingen, Germany. Imagery has a ground sampling distance (GSD) of 9 cm, covering small towns and urban areas, annotated for 2D semantic segmentation (impervious surfaces, buildings, low vegetation, trees, cars, clutter) and 3D point cloud labeling (nine classes including powerline and façade) (Xu et al., 2020, Lin et al., 2020).
ISPRS Potsdam (2D): Features very high-resolution RGB and NIR images with a GSD of 5 cm, 6000×6000 pixels per tile, with corresponding DSMs. It’s primarily used for dense semantic segmentation in large-scale urban scenes (Wang et al., 21 Apr 2024).

Both datasets provide detailed ground truth for quantitative evaluation, high spatial resolution, and a range of semantic classes relevant to urban analysis.

2. Methodological Frameworks and Benchmark Tasks

Giblayout Applications

The core task for Giblayout is floor plan reconstruction from indoor point clouds. Methods such as FloorSAM leverage grid-based filtering, density map projection, and zero-shot segmentation using the Segment Anything Model (SAM). The process is as follows:

Project 3D point cloud onto the XOY plane to obtain a bird’s-eye density map—retaining ceiling-adjacent points for spatial feature clarity.
Generate adaptive prompt points based on enhanced density maps as input for SAM.
Apply multi-stage filtering to handle composite/incomplete room masks.
Fuse SAM semantic masks with geometric cues from the original point cloud for robust contour extraction and regularization (Ye et al., 19 Sep 2025).

This approach enables automatic, high-fidelity recovery of room boundaries and room relationships, robust even in non-Manhattan, cluttered, or noisy scenes.

ISPRS Tasks

ISPRS datasets support a rich array of tasks:

2D Semantic Segmentation: Mapping every image pixel to a semantic class. Recent methods leverage deep learning architectures, including attention-based pyramids (Xu et al., 2020), transformer-based bilateral fusion (Wang et al., 2021), state space models (Zhu et al., 2 Apr 2024), lightweight multimodal fusion (Wang et al., 21 Apr 2024), and convolutional decoders (Dai et al., 6 Aug 2025).
3D Point Cloud Segmentation: Assigning class labels to individual 3D points. Models like LGENet integrate hybrid 2D/3D convolutions and segment-based global context, exploiting the diverse geometry of urban structures (Lin et al., 2020).
Multimodal Fusion: Incorporation of DSM as an explicit modality for improved segmentation and detection (via e.g., MANet, LMFNet, AMMNet), where modality misalignment and efficient feature fusion are critical research problems (Ma et al., 15 Oct 2024, Ye et al., 22 Jul 2025, Wang et al., 21 Apr 2024).
Object Detection/Vehicle Detection: Tasks now include OBB-format detection in dense and occluded scenes, using multimodal cross attention and hard/easy sample discrimination (Wu et al., 14 May 2024).

3. Quantitative Evaluation, Metrics, and Performance

Dataset/Domain	Standard Metrics	Typical Results (Recent Best)
Giblayout	Room count accuracy, contour edge precision, recall	Precision ~0.90, Recall ~0.94 (Ye et al., 19 Sep 2025)
ISPRS Vaihingen, 2D	OA, mIoU, per-class IoU/F1-score	mIoU 85–87%, OA >91% (Ye et al., 22 Jul 2025, Ma et al., 15 Oct 2024)
ISPRS Potsdam, 2D	OA, mIoU, per-class IoU	mIoU >87%, OA >92% (Ye et al., 22 Jul 2025, Ma et al., 15 Oct 2024)
ISPRS 3D	Overall accuracy (OA), avg F1-score	OA 0.845, F1 0.737 (Lin et al., 2020)

In FloorSAM, boundary correspondence is given by: $precision_{boundary} = \frac{boundary_{\text{true}}}{boundary_{\text{all}}}$

$recall_{boundary} = \frac{boundary_{\text{true}}}{boundary_{\text{gt}}}$

where “true” counts matched to ground truth, “all” is the number of detected, and “gt” is the number of ground truth boundaries.

ISPRS segmentation employs mean IoU as: $mIoU = \frac{1}{C}\sum_{i=1}^C \frac{TP_i}{TP_i + FP_i + FN_i}$ where TP, FP, FN are true positives, false positives, and false negatives for class $i$ .

4. Comparative Analysis and Methodological Advances

FloorSAM vs. Traditional Methods: On both Giblayout and ISPRS, FloorSAM produces more reliable room segmentation, higher boundary precision/recall, and better robustness to noise than prior rule-based or Mask R-CNN approaches, especially in non-Manhattan and occluded environments (Ye et al., 19 Sep 2025).
ISPRS Progression: Successive architectures—FFPNet/attention pyramids (Xu et al., 2020), bilateral awareness transformers (Wang et al., 2021), state space (Samba) (Zhu et al., 2 Apr 2024), and asymmetric/fusion models (MANet (Ma et al., 15 Oct 2024), AMMNet (Ye et al., 22 Jul 2025))—have successively pushed 2D mIoU close to 87% and OA above 92%. Multimodal networks excel by resolving class confusions and improving delineation in ambiguous regions.
Multi-Modal Considerations: Challenges include aligning heterogeneous modality features (contextual in RGB vs. structural in DSM) and controlling computational cost. Designs like AMMNet’s asymmetric encoding and MANet’s multimodal adapter enable efficient fusion while preserving generalization (Ma et al., 15 Oct 2024, Ye et al., 22 Jul 2025).
Superpixel Fusion (FuSS): As a post-processing step, FuSS fuses different superpixel segmentations, merging small inconsistent regions with Mahalanobis distance to improve open-set segmentation consistency (reducing “salt-and-pepper” noise and increasing AUROC and Cohen’s kappa scores on ISPRS) (Nunes et al., 2022).

5. Practical and Research Implications

High-precision and high-recall reconstruction from FloorSAM allows for robust indoor navigation, improved SLAM, and accurate BIM integration. The topology preservation and vectorized planar output facilitate downstream structural analysis in robotics and digital twinning.

Urban Mapping and Environmental Planning (ISPRS)

Consistent, high-resolution segmentation underpins urban planning, hazard response, infrastructure asset management, and environmental monitoring. The availability of DSM and multi-modal imagery permits elevation-informed mapping—a critical advantage for distinguishing between flat surfaces, buildings, vegetation, and vehicles.

Generalization and Robustness

Zero-shot inference via SAM-guided techniques (e.g., FloorSAM) demonstrates transference to novel geometric setups and sensor configurations without substantial re-training or annotation. In remote sensing, knowledge distillation (Lê et al., 24 May 2024) and synthetic label generation (ALPS (Zhang et al., 16 Jun 2024)) enable task generalization under limited annotation regimes.

6. Limitations, Open Challenges, and Future Directions

Giblayout: The accuracy of reconstructed topologies depends on the quality of input point clouds and the spatial uniformity of LiDAR returns. Handling missing regions (e.g., window boundaries) requires further semantic integration (Ye et al., 19 Sep 2025).
ISPRS and Multimodal Fusion: Modality alignment and redundancy mitigation are ongoing challenges. Efficient yet expressive designs—such as asymmetric encoding (AMMNet) or lightweight, self-attention-based fusion (LMFNet)—seek to minimize computational and memory load while maximizing performance.
Transfer to Other Domains: While current benchmarks focus on urban environments, future datasets will likely encompass more diverse scenes, such as complex industrial facilities, merged indoor–outdoor transitions, or disaster zones, requiring further method adaptation (e.g., handling occlusion or dynamic scenes in ISPRS or future Giblayout variants).
Evaluation Practices: Metrics such as mIoU and boundary recall remain the standard, but the growing prevalence of open-set and few-shot setups demands new protocols for benchmarking segmentation under distribution shift and annotation sparsity.

7. Concluding Synthesis

Giblayout and ISPRS datasets have established themselves as indispensable resources for structural and semantic analysis in both indoor and outdoor spatial environments. Giblayout provides a rigorous benchmark for floorplan reconstruction from indoor point clouds, emphasizing topological fidelity and geometric regularity. ISPRS datasets, with their combination of high-resolution imagery and point clouds, are central to advancing semantic segmentation, object detection, and multimodal fusion in remote sensing. Recent methods, such as FloorSAM on Giblayout (Ye et al., 19 Sep 2025) and AMMNet/MANet/LMFNet on ISPRS (Ye et al., 22 Jul 2025, Ma et al., 15 Oct 2024, Wang et al., 21 Apr 2024), demonstrate the impact of integrating geometric cues, transformer-based representations, foundation models, and robust fusion mechanisms. These benchmarks continue to drive innovation, providing a testbed for methodological advances that translate directly into real-world applications across navigation, BIM, mapping, and environmental management.