CubiCasa5K Dataset Overview

Updated 9 December 2025

The CubiCasa5K dataset is a large-scale collection of 5,000 floorplan images with dense polygon-based annotations for detailed semantic and geometric parsing.
It offers diverse architectural styles with rigorous train/val/test splits, supporting reliable evaluation of multi-task CNN models.
The dataset underpins advanced research, with baseline results demonstrating significant improvements in semantic segmentation and structural element extraction.

The CubiCasa5K dataset is a large-scale resource for automatic floorplan image analysis, comprising 5,000 rasterized floorplan images sourced primarily from Finnish real estate marketing material. Each sample includes dense, polygon-based annotations spanning over 80 object categories, enabling both geometric and semantic parsing. The dataset was constructed to address the scarcity of publicly-available, representative, and meticulously annotated floorplan image collections, particularly for use in machine learning and computer vision research on building interiors. A comprehensive baseline is provided with an improved multi-task convolutional neural network (CNN) architecture for semantic segmentation and structural element extraction (Kalervo et al., 2019).

1. Dataset Structure and Composition

CubiCasa5K offers a diverse, style-aware corpus. From an initial pool of approximately 15,000 images, 5,000 were selected using explicit criteria on clarity, completeness (full single-floor layouts), and the visibility of all key architectural elements (walls, rooms, doors, windows, furniture).

Subsets by style:
- High-quality architectural: 3,732 images
- High-quality black-and-white CAD-style: 992 images
- Colorful, hand-drawn/marketing style: 276 images
Train/val/test partitioning:
- Training: 4,200 images
- Validation: 400 images
- Test: 400 images

Each split was sampled to preserve style and size variability. Only legible, fully scanned, single-floor images with all necessary architectural elements were retained.

2. Annotation Protocol and Object Taxonomy

Polygonal annotation is performed for every semantically meaningful object using SVG format, with a rigorous 2-stage quality assurance protocol: initial self-review by the annotator, then independent QA engineer verification.

Annotation workflow:

Draw wall polygons (distinguishing outer and inner boundaries)
Draw room polygons (covering all enclosed cells)
Place icons and mark opening polygons

Category distribution (major classes):
- Rooms (12 baseline classes): Background, Outdoor, Wall, Kitchen, Living Room, Bedroom, Bath, Hallway, Railing, Storage, Garage, Other Rooms
- Icon/opening (11 baseline classes): Window, Door, Closet, Electrical Appliance, Toilet, Sink, Sauna Bench, Fire Place, Bathtub, Chimney, Empty
Objects per image (averages across 5,000 samples):
- Walls: ~29.4
- Rooms: ~13.8
- Icons: ~27.3

The overall taxonomy comprises approximately 83 fine-grained classes, with details available in the project's repository.

3. Dataset Statistics and Comparative Overview

CubiCasa5K demonstrates substantial diversity and scale relative to prior datasets. Image widths range from 50 to ~8,000 pixels (median: 1,500 px; modes: ~600 px and 2,000 px). The number of rooms per floorplan peaks at 8–12; walls at 20–30 segments; icons typically 15–30 per image.

Comparative metrics with existing datasets:

Dataset	#Images	Resolution Range	#Object Classes	#Rooms
R-FP-500 [dodgeMVA’17]	500	56–1,427	N/A	N/A
CVC-FP [Heras ’15]	122	905–7,383	50	1,320
Liu et al. ’17	815	96–1,920	27	7,466
CubiCasa5K	5,000	50–8,000	83	68,877

Bedrooms are the most frequent room type (~16% of all room polygons), followed by Kitchens, Living Rooms, and Bathrooms.
Doors (~20%), Windows (~18%), Sinks, and Toilets are the most common icons.

4. Mathematical Formalization

Let $I \in \mathbb{R}^{H \times W \times 3}$ denote the input rasterized floorplan image. Ground-truth annotations are given by $G = \{ (P_i, c_i) \}_{i=1}^N$ , where $P_i = (x_{i,1}, \ldots, x_{i,k_i})$ is a $k_i$ -vertex polygon and $c_i \in \{1, \ldots, C\}$ is the object label.

The baseline network $f_\theta(I)$ produces:

$S_{\text{rooms}} \in \mathbb{R}^{H \times W \times R}$ : per-pixel scores for the $R$ room classes
$S_{\text{icons}} \in \mathbb{R}^{H \times W \times K}$ : per-pixel scores for $K$ icon classes
$\{H_j\}_{j=1}^M$ : $M$ heatmaps for wall junctions, icon corners, and opening endpoints

5. Baseline Multi-Task Convolutional Neural Network

The provided baseline utilizes a ResNet–152 backbone (ImageNet → MPII pose transfer), and an hourglass decoder (10 blocks with skip connections). The architecture emits two semantic-segmentation heads (rooms, icons) and 21 heatmap regression heads.

Training:
- Loss function follows Kendall et al. (2018) multi-task uncertainty:
$\mathcal{L}_{\text{tot}} = \mathcal{L}_H + \mathcal{L}_S$ - Heatmap regression loss with learned uncertainty $\sigma_i$ :

$\mathcal{L}_H = \sum_{i=1}^M \left[ \frac{1}{2 \sigma_i^2} \| H_i^{gt} - H_i^\theta \|_2^2 + \log (1 + \sigma_i) \right]$ - Cross-entropy segmentation loss per task with uncertainty $\sigma_k$ :

$\mathcal{L}_S = \sum_{k \in \{\text{rooms}, \text{icons}\}} \left[ \frac{1}{\sigma_k} \left(-\sum_p y_{k,p} \log S_{k,p}\right) + \log \sigma_k \right]$
Optimization: Adam (learning rate 1e-3; $\beta_1 = 0.9$ , $\beta_2 = 0.999$ ), batch size 20, up to 400 epochs. Training uses data augmentation (random 90° rotations, color jitter, random crop/scale to 256×256, zero-padding).
Hardware: Single NVIDIA Titan X; full training completes in approximately 3 hours.

6. Evaluation Outcomes

6.1. Benchmarking on Liu et al. ’17

The network demonstrates consistent improvements over Liu et al. ’17, achieving the following (with and without integer programming (IP) post-processing and test-time augmentation (TTA)):

Method	Junction acc/rec	Opening acc/rec	Icon acc/rec	Room acc/rec
Liu et al. ’17	70.7 / 95.1	67.9 / 91.4	22.3 / 77.4	80.9 / 78.5
Liu et al. + IP	94.7 / 91.7	91.9 / 90.2	84.0 / 74.6	84.5 / 88.4
Ours	82.4 / 92.0	82.3 / 93.3	34.6 / 88.3	90.0 / 87.6
Ours + IP	94.1 / 89.6	93.2 / 92.6	92.9 / 87.7	91.7 / 90.8
Ours (TTA) + IP	95.0 / 89.7	94.5 / 92.9	93.6 / 87.3	92.2 / 90.2

The baseline outperforms Liu et al. on all metrics without post-processing. Addition of IP and TTA pushes accuracy and recall above 92% for all categories.

6.2. Semantic Segmentation Results on CubiCasa5K

The primary evaluation treats parsing as a pixel-wise semantic-segmentation task, reporting overall accuracy, mean class accuracy, and mean Intersection-over-Union (IoU):

Task	Overall Acc	Mean Acc	Mean IoU
Rooms val	84.5%	72.3%	61.0%
Rooms test	82.7%	69.8%	57.5%
Rooms (Poly) test	77.3%	61.6%	49.3%
Icons val	97.8%	62.8%	56.5%
Icons test	97.6%	61.5%	55.7%
Icons (Poly) test	96.7%	45.3%	41.6%

Post-processing to extract actual polygons ("Poly") introduces accuracy drops, primarily due to errors in junction detection. On the test set, mean room-IoU is ~57.5% and icon-IoU is ~55.7%.

7. Limitations and Prospects

Despite its scale, very rare architectural symbols (frequency < 1%) remain under-represented. Current post-processing relies on heuristics; errors, especially in junction localization, can propagate and undermine polygon extraction. Future work suggested by the authors includes integration of an explicit object-detection head for icons (as in Dodge et al. ’17), exploration of direct polygon regression approaches (cf. Acuna et al. ’18), and extension of both dataset and methodology to multi-floor or 3D structures such as stairs and elevators.

The data, SVG annotations, and baseline implementations are publicly available at https://github.com/CubiCasa/CubiCasa5k, providing a standardized foundation for further work in automatic floorplan parsing, structural scene understanding, and downstream AR/VR applications (Kalervo et al., 2019).

PDF Markdown Chat (Pro)

References (1)

CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis (2019)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to CubiCasa5k Dataset.