- The paper's main contribution is introducing Floorplan Markup Language (FML) that encodes architectural elements for unified vector floorplan generation.
- It employs an autoregressive transformer with constrained decoding to ensure high fidelity synthesis, outperforming prior methods with lower FID scores.
- The study demonstrates interactive editing and completion capabilities, and it outlines future directions for multi-story and language-driven floorplan generation.
Unified Vector Floorplan Generation via Markup Representation
Introduction and Motivation
"Unified Vector Floorplan Generation via Markup Representation" (2604.04859) proposes a novel paradigm for automated residential floorplan synthesis, addressing the lack of generalizable, high-fidelity generation from heterogeneous conditional inputs such as site boundaries, room adjacency graphs, and partial layouts. Previous methods, either optimization-based or generative, suffered from limited flexibility, fidelity-diversity tradeoffs, and redundancy due to non-optimal representations (e.g., raster-to-vector conversion, multi-stage pipelines). The core innovation is Floorplan Markup Language (FML), a structured grammar akin to HTML, unifying all vector floorplan generation tasks as a sequence prediction problem.
Figure 1: Floorplan Markup Language (FML) encodes floorplans, boundaries, and graphs as markup sequences, unifying multiple generation paradigms.
Floorplan Markup Language (FML)
FML formalizes floorplan information—rooms, doors, boundaries, adjacency graphs—into a sequence of tokens adhering to strict syntactic rules. Each element (room polygon, interior/front door, boundary polygon, adjacency matrix) is represented by tags, indices, types, and coordinates. The tag-based structure enforces architectural validity and compositionality, guiding generative models towards coherent layouts.
Grammar and Sequence Encoding
- Tags for sequence boundaries, floorplan, room, door, front door, boundary, and graph
- Ordered encoding for rooms, doors, and conditions (room indices, vertices, types)
- Conditions represented as prepended markup (boundary vertices, adjacency graph)
FML enables seamless formulation of unconditional, boundary-conditioned, graph-conditioned, number-conditioned, and completion tasks as next-token prediction, eliminating task-specific conversion and preconditioning requirements.
The generative model employs an autoregressive transformer, inspired by LLMs (e.g., LLaMA-3), trained to generate FML sequences via next token prediction. Tags, indices, types, and coordinates are embedded and projected to the output space for decoding. The architecture supports variable-length generation, permutation-equivalent room encoding, and constrained decoding to enforce grammar consistency.
Figure 2: The model architecture mirrors LLMs, employing transformers for sequential next-token prediction and autoregressive inference.
Constrained Decoding
In inference, grammar constraints are enforced by zeroing probabilities of improper token classes (e.g., doors must have two vertices, room vertices must be outside existing rooms), supporting high-validity synthesis.
Completion and Editing
The autoregressive approach enables completion of partial sequences and interactive editing: arbitrary fragments can be complemented or modified within the markup structure.
Figure 3: Completion and editing workflows for floorplans, leveraging updateable FML sequences for user interaction.
Experimental Evaluation
Comprehensive benchmarking is conducted on the RPLAN dataset, assessing model performance across:
- Unconditional generation (FID: 7.22 vs. GSDiff's 15.02)
- Boundary-conditioned generation (FID: 6.51, IoU: 97.86 vs. Graph2Plan's FID: 34.20, IoU: 95.87)
- Graph-conditioned generation (FID: 3.41 vs. HouseGAN++'s 48.44; GED: 1.21 vs. HouseDiffusion's 1.55)
- Multi-conditional generation (Boundary + Graph: FID: 14.17; IoU: 97.59 vs. Graph2Plan's FID: 22.87, IoU: 92.96)
The model consistently outperforms prior task-specific approaches, demonstrating robust generalization to diverse conditional inputs and improved metric scores.


Figure 4: Qualitative comparison—consistent, realistic samples from various conditions versus prior models, highlighting coverage and fidelity.
Ablation Studies and Analysis
The importance of permutation-equivariant room encoding and descending order indexing is substantiated by substantial drops in FID when these augmentations are removed. Constrained decoding mitigates topological errors (e.g., overlapping rooms, misaligned doors), enhancing architectural validity.
Figure 5: Impact of constrained decoding—eliminates geometric and topological inconsistencies compared to unconstrained generation.
User studies reinforce the functional and natural quality of generated floorplans; the method achieves a winning rate of 51% versus 24% and 32% for HouseGAN++ and HouseDiffusion, respectively.
Limitations and Future Directions
Despite advances, there are clear limitations:
- Current scope is restricted to single-story floorplans; extension to multi-story layouts requires additional vertical hierarchy tagging (e.g.,
<story>)
- FML integration with LLMs for natural language-driven generation is a promising avenue for enhanced accessibility and flexibility
Practical and Theoretical Implications
FML and its transformer-based generative model establish a scalable, unified framework for vector floorplan generation, eliminating task fragmentation and enabling interactive, controllable, and structurally valid synthesis. Practically, this advances automated, accessible architectural design, supporting rapid iteration and editing. Theoretically, FML’s strict markup grammar demonstrates the efficacy of symbolic-structural sequence representation in geometric synthesis tasks and suggests broader applicability to other domain-specific generative problems.
Conclusion
The introduction of Floorplan Markup Language and its autoregressive transformer-based generator rigorously unifies vector floorplan synthesis across a wide array of conditional tasks. Empirical evaluation demonstrates superior fidelity, diversity, and generalization relative to prior task-specific approaches. The approach’s extensibility points towards integration with LLMs and multi-story designs, offering both practical value for architectural applications and theoretical insight into structured sequence modeling for spatial domain generation.