Unified Vector Floorplan Generation via Markup Representation

Published 6 Apr 2026 in cs.CV | (2604.04859v1)

Abstract: Automatic residential floorplan generation has long been a central challenge bridging architecture and computer graphics, aiming to make spatial design more efficient and accessible. While early methods based on constraint satisfaction or combinatorial optimization ensure feasibility, they lack diversity and flexibility. Recent generative models achieve promising results but struggle to generalize across heterogeneous conditional tasks, such as generation from site boundaries, room adjacency graphs, or partial layouts, due to their suboptimal representations. To address this gap, we introduce Floorplan Markup Language (FML), a general representation that encodes floorplan information within a single structured grammar, which casts the entire floorplan generation problem into a next token prediction task. Leveraging FML, we develop a transformer-based generative model, FMLM, capable of producing high-fidelity and functional floorplans under diverse conditions. Comprehensive experiments on the RPLAN dataset demonstrate that FMLM, despite being a single model, surpasses the previous task-specific state-of-the-art methods.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper's main contribution is introducing Floorplan Markup Language (FML) that encodes architectural elements for unified vector floorplan generation.
It employs an autoregressive transformer with constrained decoding to ensure high fidelity synthesis, outperforming prior methods with lower FID scores.
The study demonstrates interactive editing and completion capabilities, and it outlines future directions for multi-story and language-driven floorplan generation.

Unified Vector Floorplan Generation via Markup Representation

Introduction and Motivation

"Unified Vector Floorplan Generation via Markup Representation" (2604.04859) proposes a novel paradigm for automated residential floorplan synthesis, addressing the lack of generalizable, high-fidelity generation from heterogeneous conditional inputs such as site boundaries, room adjacency graphs, and partial layouts. Previous methods, either optimization-based or generative, suffered from limited flexibility, fidelity-diversity tradeoffs, and redundancy due to non-optimal representations (e.g., raster-to-vector conversion, multi-stage pipelines). The core innovation is Floorplan Markup Language (FML), a structured grammar akin to HTML, unifying all vector floorplan generation tasks as a sequence prediction problem.

Figure 1: Floorplan Markup Language (FML) encodes floorplans, boundaries, and graphs as markup sequences, unifying multiple generation paradigms.

Floorplan Markup Language (FML)

FML formalizes floorplan information—rooms, doors, boundaries, adjacency graphs—into a sequence of tokens adhering to strict syntactic rules. Each element (room polygon, interior/front door, boundary polygon, adjacency matrix) is represented by tags, indices, types, and coordinates. The tag-based structure enforces architectural validity and compositionality, guiding generative models towards coherent layouts.

Grammar and Sequence Encoding

Tags for sequence boundaries, floorplan, room, door, front door, boundary, and graph
Ordered encoding for rooms, doors, and conditions (room indices, vertices, types)
Conditions represented as prepended markup (boundary vertices, adjacency graph)

FML enables seamless formulation of unconditional, boundary-conditioned, graph-conditioned, number-conditioned, and completion tasks as next-token prediction, eliminating task-specific conversion and preconditioning requirements.

Autoregressive Transformer Architecture

The generative model employs an autoregressive transformer, inspired by LLMs (e.g., LLaMA-3), trained to generate FML sequences via next token prediction. Tags, indices, types, and coordinates are embedded and projected to the output space for decoding. The architecture supports variable-length generation, permutation-equivalent room encoding, and constrained decoding to enforce grammar consistency.

Figure 2: The model architecture mirrors LLMs, employing transformers for sequential next-token prediction and autoregressive inference.

Constrained Decoding

In inference, grammar constraints are enforced by zeroing probabilities of improper token classes (e.g., doors must have two vertices, room vertices must be outside existing rooms), supporting high-validity synthesis.

Completion and Editing

The autoregressive approach enables completion of partial sequences and interactive editing: arbitrary fragments can be complemented or modified within the markup structure.

Figure 3: Completion and editing workflows for floorplans, leveraging updateable FML sequences for user interaction.

Experimental Evaluation

Comprehensive benchmarking is conducted on the RPLAN dataset, assessing model performance across:

Unconditional generation (FID: 7.22 vs. GSDiff's 15.02)
Boundary-conditioned generation (FID: 6.51, IoU: 97.86 vs. Graph2Plan's FID: 34.20, IoU: 95.87)
Graph-conditioned generation (FID: 3.41 vs. HouseGAN++'s 48.44; GED: 1.21 vs. HouseDiffusion's 1.55)
Multi-conditional generation (Boundary + Graph: FID: 14.17; IoU: 97.59 vs. Graph2Plan's FID: 22.87, IoU: 92.96)

The model consistently outperforms prior task-specific approaches, demonstrating robust generalization to diverse conditional inputs and improved metric scores.

Figure 4: Qualitative comparison—consistent, realistic samples from various conditions versus prior models, highlighting coverage and fidelity.

Ablation Studies and Analysis

The importance of permutation-equivariant room encoding and descending order indexing is substantiated by substantial drops in FID when these augmentations are removed. Constrained decoding mitigates topological errors (e.g., overlapping rooms, misaligned doors), enhancing architectural validity.

Figure 5: Impact of constrained decoding—eliminates geometric and topological inconsistencies compared to unconstrained generation.

User studies reinforce the functional and natural quality of generated floorplans; the method achieves a winning rate of 51% versus 24% and 32% for HouseGAN++ and HouseDiffusion, respectively.

Limitations and Future Directions

Despite advances, there are clear limitations:

Current scope is restricted to single-story floorplans; extension to multi-story layouts requires additional vertical hierarchy tagging (e.g., <story>)
FML integration with LLMs for natural language-driven generation is a promising avenue for enhanced accessibility and flexibility

Practical and Theoretical Implications

FML and its transformer-based generative model establish a scalable, unified framework for vector floorplan generation, eliminating task fragmentation and enabling interactive, controllable, and structurally valid synthesis. Practically, this advances automated, accessible architectural design, supporting rapid iteration and editing. Theoretically, FML’s strict markup grammar demonstrates the efficacy of symbolic-structural sequence representation in geometric synthesis tasks and suggests broader applicability to other domain-specific generative problems.

Conclusion

The introduction of Floorplan Markup Language and its autoregressive transformer-based generator rigorously unifies vector floorplan synthesis across a wide array of conditional tasks. Empirical evaluation demonstrates superior fidelity, diversity, and generalization relative to prior task-specific approaches. The approach’s extensibility points towards integration with LLMs and multi-story designs, offering both practical value for architectural applications and theoretical insight into structured sequence modeling for spatial domain generation.

Markdown Report Issue