Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries (2211.15658v2)

Published 28 Nov 2022 in cs.CV

Abstract: We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models are available at: https://github.com/ywyue/RoomFormer.

Citations (23)

View on Semantic Scholar

Summary

The paper introduces RoomFormer, a Transformer-based model that uses two-level queries to jointly predict room polygons and vertex coordinates in one stage.
It integrates a CNN backbone with deformable attention to efficiently process spatial features and enable parallel prediction of complex room configurations.
Experiments on Structured3D and SceneCAD demonstrate that RoomFormer outperforms prior methods with F1 scores up to 97.3 for rooms and significantly faster inference times.

Insightful Overview of "Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries"

The paper "Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries" presents RoomFormer, a cutting-edge Transformer-based model designed for reconstructing 2D floorplans from 3D scans. This research addresses the inherent complexities of floorplan reconstruction by harnessing a novel approach that simultaneously identifies room polygons and their associated vertex coordinates. The significance of this work lies in its capability to execute these tasks in a single stage without the need for heuristic-driven methods or intermediate steps, often seen in prior approaches.

Methodology and Innovations

RoomFormer is constructed around a Transformer architecture that utilizes a unique two-level query mechanism. This framework is pivotal as it enables simultaneous predictions of multiple rooms in a holistic manner. Each room is represented by a variable-length sequence of vertices, effectively encapsulating rooms into polygons based on corner coordinates. The two-level queries correspond to room polygons and their corners, which distinguishes inputs at different granularities and facilitates parallel processing of complex room configurations.

The model employs a Transformer encoder-decoder with a backbone Convolutional Neural Network (CNN) for feature extraction. The deformable attention mechanism in the Transformer alleviates computational burdens by restricting attention to a subset of relevant spatial features. This efficiency is further enhanced by introducing polygon matching strategies, enabling the model to be end-to-end trainable.

Strong Numerical Results and Claims

The model's efficacy is substantiated through experiments on the Structured3D and SceneCAD datasets. On Structured3D, RoomFormer outperforms existing methods such as HEAT and MonteFloor, yielding F1 scores of 97.3 for rooms, 87.2 for corners, and 81.2 for angles, alongside faster inference times. Similarly, it demonstrates robust performance on SceneCAD, enhancing generalization capabilities across datasets. These results highlight not only the precision of RoomFormer but also its computational efficiency and adaptability to varying data characteristics.

Implications for Practice and Theory

Practically, RoomFormer sets a new benchmark for applications in robotics, interior design, and augmented/virtual reality, due to its rapid and accurate floorplan reconstruction from point clouds. Theoretically, the paper opens discussions on leveraging Transformer models for structured prediction tasks beyond conventional sequence-to-sequence applications. The two-level query approach may inspire future research in related domains involving hierarchical data structuring.

Future Directions

Looking ahead, the research suggests several pathways for advancement. One potential direction is the refinement of semantic enrichment capabilities, further extending the model to include additional architectural elements and room types. Another avenue is the exploration of the Transformer model's adaptability across other structured prediction tasks within computer vision and beyond.

In summary, "Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries" presents a novel Transformer-based method that sets a substantial precedent in floorplan reconstruction. By innovatively combining two-level queries within a single Transformer architecture, this research not only elevates current methodologies but also lays the groundwork for future exploration in related fields.

PDF Markdown

Related Papers

GitHub

GitHub - ywyue/RoomFormer: [CVPR 2023] RoomFormer: Two-level Queries for Single-stage Floorplan Reconstruction (170 stars)

Tweets

https://twitter.com/georvitymusic/status/1597511204603564034