ATISS: Autoregressive Transformers for Indoor Scene Synthesis (2110.03675v1)

Published 7 Oct 2021 in cs.CV

Abstract: The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its floor plan. In contrast to prior work, which poses scene synthesis as sequence generation, our model generates rooms as unordered sets of objects. We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis. For example, the same trained model can be used in interactive applications for general scene completion, partial room re-arrangement with any objects specified by the user, as well as object suggestions for any partial room. To enable this, our model leverages the permutation equivariance of the transformer when conditioning on the partial scene, and is trained to be permutation-invariant across object orderings. Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision. Evaluations on four room types in the 3D-FRONT dataset demonstrate that our model consistently generates plausible room layouts that are more realistic than existing methods. In addition, it has fewer parameters, is simpler to implement and train and runs up to 8 times faster than existing methods.

PDF Abstract

The paper "ATISS: Autoregressive Transformers for Indoor Scene Synthesis" presents a novel approach for generating realistic and diverse indoor furniture layouts using an autoregressive transformer architecture. Unlike previous scene synthesis methods that employ sequence generation, ATISS conceptualizes room layout generation as an unordered set generation task, which is argued to be more natural and versatile for broader applications such as user-interactive scene completion and object suggestions.

Key Contributions and Methodology

Unordered Set Generation: The authors propose a novel transformation of the scene synthesis problem, framing it as an unordered set generation task. This approach leverages the permutation equivariance property of transformers, allowing the model to generate rooms without imposing a strict ordering on objects, thereby facilitating applications like partial room re-arrangement and user-defined constraints.
Architecture: ATISS utilizes a permutation-invariant transformer model trained to predict the presence and arrangement of objects in a room given a floor plan and room type. The approach combines a layout encoder for room shape feature extraction and a structure encoder for object attribute embedding. The model operates autoregressively, predicting object categorizations sequentially but invariant to the order of input objects, ensuring practical and efficient layout generation.
Training and Evaluation: The model leverages labeled 3D bounding boxes from the 3D-FRONT dataset to train an end-to-end autoregressive generative model. Evaluation on this dataset for various room types demonstrates ATISS's superiority in generating layouts that are more realistic and computationally efficient than existing methods.

Experimental Results

Comparison: The performance of ATISS surpasses that of state-of-the-art models like FastSynth and SceneFormer, achieving lower FID scores and higher realism in user studies. ATISS requires fewer parameters and demonstrates accelerated run-time efficiency, up to 8x faster, indicating the method’s practical advantages for large-scale applications.
Qualitative Assessment: Generated scenes maintain functional arrangements devoid of unnatural constraints or ordering biases typically present in prior autoregressive models. Notably, the model handles diverse room configurations and interactive scenarios without requiring scene graphs or procedural rules typically used in other methods.

Applications and Interactive Scenarios

Scene Completion and Object Suggestion: ATISS can automatically suggest the addition and arrangement of objects within existing room contexts, based on user-defined partial constraints or completion goals. This adaptability is showcased through realistic scene completion and object arrangement scenarios, emphasizing its utility in interactive design applications.
Error Detection and Correction: The model can identify and rectify problematic furniture configurations by assessing object likelihoods within a given scene, repositioning objects for spatial coherence and maintaining functional design.

Limitations

While ATISS makes significant strides in indoor scene synthesis, the paper notes the potential for improvement in generating detailed style attributes and extending order invariance to object attributes. Handling cultural and stylistic variations in scene generation remain open challenges that future research may address.

In summary, ATISS introduces a paradigm shift in indoor scene synthesis by treating layout generation as an unordered set problem, resulting in a model that is adaptable, efficient, and highly capable in interactive and automated layout scenarios.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Despoina Paschalidou (20 papers)
Amlan Kar (19 papers)
Maria Shugrina (8 papers)
Karsten Kreis (50 papers)
Andreas Geiger (136 papers)
Sanja Fidler (184 papers)

Citations (114)

View on Semantic Scholar

ATISS: Autoregressive Transformers for Indoor Scene Synthesis (2110.03675v1)

Key Contributions and Methodology

Experimental Results

Applications and Interactive Scenarios

Limitations

Related Papers