Papers
Topics
Authors
Recent
Search
2000 character limit reached

Structured3D: Indoor Scene Dataset

Updated 10 February 2026
  • Structured3D is a comprehensive synthetic dataset providing photo-realistic 3D indoor environments derived from 3,500 CAD-modeled houses with detailed geometry and semantic labels.
  • It supports structured 3D modeling tasks such as floorplan reconstruction and room layout estimation using multi-view images, paired empty/furnished panoramas, and precise camera calibration.
  • The dataset enables benchmarking with standardized evaluation protocols and metrics, driving advancements in deep learning methodologies for indoor scene synthesis and structural recovery.

Structured3D is a large-scale, photo-realistic synthetic dataset providing richly annotated 3D indoor environments, designed to enable and benchmark structured 3D modeling, floorplan reconstruction, room layout estimation, and 360-degree scene synthesis. Originating from a base of professionally designed CAD house models, Structured3D offers comprehensive structural, geometric, and semantic annotations at scale. The dataset has catalyzed advances in a range of indoor scene understanding tasks by facilitating the training and evaluation of deep learning models under standardized, realistic, and richly supervised conditions.

1. Dataset Composition and Annotation Structure

Structured3D is sourced from 3,500 professional CAD-modeled houses comprising 21,835 distinct rooms. Each house's CAD representation contains precise geometry, materials, semantic object labels, and is rendered using industry-grade Monte Carlo ray tracing to achieve high photorealism (Zheng et al., 2019). The dataset provides:

  • Multiple renderings per room: Each room is rendered under diverse lighting and furnishing configurations, supporting both panoramic (equirectangular 512×1024) and pinhole camera placements, yielding 196,515 RGB images.
  • Paired empty/furnished panoramas: Especially relevant for conditional scene synthesis, 21,835 pairs of empty and furnished panoramas allow models to learn object addition/removal with pixel-level alignment (Shum et al., 2023).
  • Primitives and relationships: Annotations include planes PP, lines LL, junctions XX, cuboid groupings (parametrized by axes and half-lengths), and Manhattan world structures. Relationships capture plane–line and line–junction incidence, cuboid symmetries, Manhattan clustering, and semantic groupings (e.g., wall, floor, chair, room).
  • Per-pixel and per-plane geometry: For every image, Structured3D provides depth maps, per-pixel plane membership masks, per-plane normals and offsets—allowing ground-truth recovery of explicit structural elements (floors, ceilings, walls) as nTX+d=0\mathbf{n}^T \mathbf{X} + d = 0 (Huang et al., 24 Feb 2025).
  • Camera intrinsics/poses: Provided for every image, facilitating cross-view geometric learning.
  • Ground-truth projections: 2D and 3D coordinates for lines, keypoints, and objects in both camera and world frames.

2. Benchmark Tasks, Evaluation Protocols, and Metrics

Structured3D supports a wide range of tasks, with well-defined evaluation protocols:

  • Room layout estimation: Benchmark splits typically involve 3,000 train, 250 validation, and 250 test scenes (scene-level split). Models operate on equirectangular panoramas or perspective views to predict floor-wall and ceiling-wall boundaries, semantic room types, and corner positions (Fayyazsanavi et al., 2023, Huang et al., 24 Feb 2025).
  • Floorplan reconstruction: Prediction of semantic room polygons (variable arity, arbitrary geometry), windows, and doors from top-down slices or rasterized density maps. Complex plans with over 150 corners are typical (Phung et al., 9 Feb 2026).
  • Plane and structure recovery: Estimate camera-consistent plane parameters (normal, offset) from single or multi-view images (Huang et al., 24 Feb 2025).

Key metrics include:

Task Metric(s) Description
Layout estimation Mean Depth Error (Edepth\mathcal{E}_\text{depth}) Error in projected wall/floor boundary depth columns; reported by distance bin
2D IoU (IoU2D\mathrm{IoU}_{2D}) Intersection over union of projected layout polygons
3D IoU, Corner Error, Pixel Error 3D volumetric overlap, normalized L2 error, per-pixel misclassification rate
Floorplan reconstruction Room/Corner/Angle F1 F1 scores for instance-matched polygons, corners (10 px tol), edge angles (5°)
Multi-view reconstruction re-IoU, re-PixelError, re-EdgeError, re-RMSE 2D reprojected accuracy; root-MSE
3D plane prec/rec Planes matched if (n^,n)<10\angle(\hat n, n) < 10^\circ, d^d<0.15m| \hat d - d | < 0.15\rm{m}
Plane/pose estimation RRA/RTA@m, mAA Relative Rotation/Translation Acc., mean Area under Accuracy (30°)
Scene synthesis FID, KID (×103\times 10^3) Distributional realism of generated images

3. Modeling Approaches Exploiting Structured3D

Structured3D has been pivotal in the advancement of several model classes:

  • Autoregressive polygon decoders: Raster2Seq frames floorplan reconstruction as a sequence-to-sequence task, outputting polygons via an autoregressive transformer. It leverages learnable spatial anchors and deformable attention, yielding highly flexible output that adapts to complex, high-corner-count floorplans (Phung et al., 9 Feb 2026).
  • Uncertainty-aware prediction: U2RLE employs a two-stage CNN with an explicit learned uncertainty per column for wall-floor boundaries, followed by distance-aware refinement. Uncertainty gating allows the method to reliably handle distant walls (Fayyazsanavi et al., 2023).
  • 360° conditional generative models: 360-Aware GANs use Structured3D’s paired empty/furnished panoramas to conditionally synthesize immersive indoor scenes via a hierarchical layout generator (parametrized by ellipses in spherical coordinates) and a StyleGAN2-based decorator. Cycle-consistency is enforced by a pretrained “emptier” (Shum et al., 2023).
  • Multi-view pointmap transformers: Plane-DUSt3R fine-tunes the DUSt3R foundation model on Structured3D by training its decoder to regress structural pointmaps. Post-processing aggregates per-view planes, recovers plane adjacencies, and aligns to camera poses (Huang et al., 24 Feb 2025).

4. Quantitative Benchmarks and Empirical Insights

Models evaluated on Structured3D have established new state-of-the-art results:

Method Room Corner Angle
HEAT 94.7 84.5 79.6
PolyRoom 98.9 96.0 91.9
RoomFormer 95.1 91.7 83.2
Raster2Seq 99.6 98.3 92.7
Model 1m 2m ... 10m 2D IoU
HorizonNet .028 .036 ... 1.44 92.63%
U2RLE .027 .031 ... 0.88 93.73%
Method re-IoU 3D-prec 3D-rec
Noncuboid+MASt3R 74.51 37.00 43.39
Plane-DUSt3R (aligned) 76.84 52.63 48.37
Model FID KID×10³
Pix2PixHD 73.33 20.56
Ours (layout learned) 64.55 11.61

5. Contributions, Strengths, and Limitations

Contributions and Strengths:

  • Comprehensiveness: Structured3D offers fully aligned, multi-modal, and large-scale annotation of interior rooms, including non-cuboid geometries (up to 70+ corners per room).
  • Photorealism: State-of-the-art rendering ensures minimal sim-to-real domain gap relative to previous synthetic datasets (Zheng et al., 2019).
  • Rich structure: Provides not only images and semantics but also geometric primitives (planes, lines, cuboids, Manhattan structure), adjacency relations, and per-view camera calibration, supporting a range of structured modeling tasks.
  • Paired/unaligned variants: Enables exploration of both conditional and unconditional modeling paradigms, including paired training for removal/insertion of scene elements (Shum et al., 2023).
  • Multi-view layout: Multiple viewpoints per room plus accurate pose allows evaluation of novel multi-view layout estimation without error-prone correspondences (Huang et al., 24 Feb 2025).

Limitations:

  • Static geometry/materials: No dynamic objects or videos, constraining tasks like SLAM or novel view synthesis (though future extensions are posited) (Zheng et al., 2019).
  • Furniture-centric structure: While furniture objects are present, detailed wireframe or smooth-surface structure for furniture is not yet systematically annotated.
  • Domain gap for truly realistic imagery: Despite high realism, there's residual gap for contact with the diversity and complexity of real sensor data, especially in lighting variation (Zheng et al., 2019).

6. Impact on Downstream Research and Extensions

Structured3D has become the primary experimental backing for multiple state-of-the-art contributions in floorplan vectorization, panoramic room layout estimation, generative 360° scene imaging, and multi-view 3D structure-from-image. It enables:

  • Direct comparisons across methods: Standardized splits and annotation formats support apples-to-apples benchmarks (Phung et al., 9 Feb 2026, Fayyazsanavi et al., 2023, Huang et al., 24 Feb 2025).
  • Robustness and generalization studies: Datasets such as WAFFLE and Zillow Indoor have shown that models pre-trained or fine-tuned on Structured3D generalize better to diverse in-the-wild data (Phung et al., 9 Feb 2026, Shum et al., 2023).
  • Ablation analysis: Comprehensive paired data allows controlled removal of signals (e.g., removing structure from input or alignment from geometric labels) to quantify their effects, which is infeasible in less richly annotated corpora (Shum et al., 2023).
  • Pipeline innovation: The explicit structure supports novel supervision signals (e.g., polygonal, plane-based, uncertainty-calibrated, or multi-stage) and flexible architecture design exploiting 3D relationships.

Further extensions under discussion include the addition of richer furniture structure annotations, dynamic interactions, and broader parametric primitive sets (e.g., for curved or non-Manhattan geometry) (Zheng et al., 2019). This suggests Structured3D will remain a central resource for advancing holistic, structure-aware indoor scene understanding across modalities and tasks.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Structure3D.