FurniScene Dataset Overview

Updated 11 November 2025

FurniScene is a large-scale 3D room dataset defined by 100K+ artist-designed rooms and 40K high-fidelity CAD models, annotated with 89 semantic object categories.
It uses a rigorous multi-stage construction pipeline featuring manual segmentation, axis alignment, and detailed labeling with tools like 3DMax and Unreal Engine.
The Two-Stage Diffusion Scene Model (TSDSM) advances realistic scene synthesis by effectively generating furniture lists and optimized layouts with superior FID, KID, and KL metrics.

FurniScene is a large-scale 3D room dataset constructed to address limitations of prior indoor scene generation resources by providing high-density, professional-grade furnishing scenes, including both large-scale furniture and intricate decorative details. Designed specifically for advancing research in 3D scene synthesis, semantic segmentation, and virtual environment generation, it combines over 100,000 artist-designed rooms and nearly 40,000 unique high-fidelity CAD meshes, annotated at fine semantic granularity across a wide diversity of real-world layouts.

1. Construction and Annotation Pipeline

FurniScene was assembled through a multi-stage process leveraging professional interior design assets. Fully furnished SketchUp models were licensed from expert artists and subsequently hand-cleaned in 3DMax. Semantic labeling was performed at the per-object level in Unreal Engine. To enhance diversity, each template was augmented systematically.

The full annotation workflow involved the following steps:

Manual segmentation: Each room's objects were segmented individually (≈1 hour per room).
Axis alignment and labeling: Rigorous spatial normalization and assignment of class labels (1–2 hours per room).
Point-cloud sampling: 30,000 spatial points were sampled per mesh, with per-point semantic assignments.

Annotation involved twenty trained students operating within coordinated 3DMax and Unreal Engine pipelines. This resulted in detailed, clean, and semantically rich 3D environments.

2. Dataset Scale and Granularity

FurniScene encompasses:

111,698 fully furnished rooms, spanning 15 distinct room types (bedrooms, living rooms, kitchens, etc.).
39,691 unique CAD models, each with high-resolution textures and precise geometry.
89 semantic object categories, including primary furniture types (sofas, beds, wardrobes), mid-sized objects (lamps, laptops), and decor-level detail (vases, books, cups, photo frames).

The dataset introduces a density and compositional diversity not present in preceding resources. For example, living rooms in FurniScene can reach up to 119 objects per instance, compared to a maximum of 25 in 3D-FRONT. The long-tailed frequency distribution of small decor items marks a critical advance over existing 3D scene datasets.

A room in FurniScene is formally specified as:

$R = \{o_1, ..., o_n\}, \quad o_i = (s_i, c_i, l_i, r_i)$

where:

$s_i \in \mathbb{R}^3$ denotes normalized object size,
$c_i \in \{0,1\}^k$ is a one-hot class vector ( $k=89$ ),
$l_i \in \mathbb{R}^3$ is the object center location (normalized),
$r_i = (\sin\theta, \cos\theta) \in \mathbb{R}^2$ encodes heading.

3. Quantitative Analysis and Distributional Properties

FurniScene's design far surpasses prior datasets in both object count and type diversity. Empirical comparisons to 3D-FRONT demonstrate:

Object types per room: Kitchens contain ≈20, while living rooms approach 60.
Objects per room: 14.4 average, max 119 (living rooms), in contrast to 6.9 average, max 25 (3D-FRONT).
Category distribution: The frequency histogram of the top 50 object categories reflects a substantial representation of small, previously underrepresented decor elements.

Object placement diversity is tracked via per-category room-frequencies and average/max object counts per room (“#NOPM” in publication tables). Object collisions are penalized during layout generation according to the loss:

$\mathcal{L}_{\mathrm{box}} = \sum_{t=1}^T w_t \cdot \sum_{o_i,o_j \in \hat{x}_0^t} \mathrm{IoU}(o_i, o_j)$

where $\mathrm{IoU}(o_i, o_j)$ penalizes bounding box overlaps at each denoising timestep.

4. Dataset Partitioning and Evaluation Benchmarks

Rooms are stratified by type and divided into train/validation/test splits in an 80/10/10 ratio. This stratification ensures consistent distributions of both large and small object counts across splits.

The dataset defines a suite of metrics for evaluating scene generation:

Metric	Description
FID (Frechét Inception Distance)	Realism of rendered top-down layouts
KID (Kernel Inception Distance)	Complementary measure of layout realism
SCA (Scene Classification Accuracy)	Indistinguishability (ideally 50% for randomness)
CKL (Category KL divergence)	Histographic fidelity to ground-truth categories

Lower FID, KID, and CKL indicate greater realism and fidelity to true category distributions; SCA approaching 50% denotes indistinguishability from real data.

5. Two-Stage Diffusion Scene Model (TSDSM)

To address the combinatorial complexity of populating richly decorated rooms, FurniScene introduces the Two-Stage Diffusion Scene Model (TSDSM):

Stage I: Furniture List Generation Model (FLGM)
- A denoising diffusion process on object size and class $(s, c)$ .
- At time $t$ , the model predicts original noise:
$\hat{\varepsilon}_{s,c} = \varepsilon_{\mathrm{FDN}}(s_t, c_t, t \mid \mathrm{text})$

with $L_2$ loss:

$\mathcal{L}_{s,c} = \mathbb{E}[\|\varepsilon_{s,c} - \hat{\varepsilon}_{s,c}\|^2]$ - Architecture: Transformer encoder with cross-attention to text conditioning.
Stage II: Layout Generation Model (LGM)
- Starting from the locked list $(s_0, c_0)$ , retrieves CAD models and denoises location and rotation $(l, r)$ :
$\hat{\varepsilon}_{l,r} = \varepsilon_{\mathrm{LDN}}(\mathrm{lock}(s_0, c_0), l_t, r_t, t)$

with L2 noise loss plus collision ( $\mathcal{L}_{box}$ ). - Architecture: 1D-UNet over object tokens.

Empirically, TSDSM achieves superior realism (lowest FID/KID) and category fidelity compared to baseline models, including SceneFormer, ATISS, and DiffuScene.

6. Potential Applications and Distribution

FurniScene is positioned as a comprehensive resource for:

Game and virtual reality environment generation: Supports synthesis of settings with detailed object placement and decor.
3D semantic segmentation and point-cloud learning: Each object provides a pre-sampled set of 30,000 points with semantic labels.
Interior design automation: Enables generation of plausible furniture layouts directly from user-specified text prompts.

The dataset and code, including pretrained TSDSM checkpoints per room type, are scheduled for public release with distribution channels and usage documentation specified in the project's online repository. This resource is intended to serve as a standardized, richly detailed foundation for research and application in realistic 3D scene understanding and synthesis.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to FurniScene Dataset.