Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
135 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

LOCOFY Large Design Models -- Design to code conversion solution (2507.16208v1)

Published 22 Jul 2025 in cs.SE and cs.AI

Abstract: Despite rapid advances in LLMs and Multimodal LLMs, numerous challenges related to interpretability, scalability, resource requirements and repeatability remain, related to their application in the design-to-code space. To address this, we introduce the Large Design Models (LDMs) paradigm specifically trained on designs and webpages to enable seamless conversion from design-to-code. We have developed a training and inference pipeline by incorporating data engineering and appropriate model architecture modification. The training pipeline consists of the following: 1)Design Optimiser: developed using a proprietary ground truth dataset and addresses sub-optimal designs; 2)Tagging and feature detection: using pre-trained and fine-tuned models, this enables the accurate detection and classification of UI elements; and 3)Auto Components: extracts repeated UI structures into reusable components to enable creation of modular code, thus reducing redundancy while enhancing code reusability. In this manner, each model addresses distinct but key issues for design-to-code conversion. Separately, our inference pipeline processes real-world designs to produce precise and interpretable instructions for code generation and ensures reliability. Additionally, our models illustrated exceptional end-to-end design-to-code conversion accuracy using a novel preview match score metric. Comparative experiments indicated superior performance of LDMs against LLMs on accuracy of node positioning, responsiveness and reproducibility. Moreover, our custom-trained tagging and feature detection model demonstrated high precision and consistency in identifying UI elements across a wide sample of test designs. Thus, our proposed LDMs are a reliable and superior solution to understanding designs that subsequently enable the generation of efficient and reliable production-ready code.

Summary

  • The paper introduces domain-specific LDMs that deterministically convert design files to code, achieving high-fidelity outputs with reproducible results.
  • It leverages a specialized architecture including a Design Optimizer, custom YOLO-based feature detection, and Auto Components for modular, responsive UI code generation.
  • Empirical results show superior performance with 89.6% of screens achieving a Preview Match Score above 95%, outperforming conventional LLMs in hierarchical design handling.

Large Design Models (LDMs) for Deterministic Design-to-Code Conversion

Motivation and Problem Statement

The paper introduces Large Design Models (LDMs), a paradigm specifically tailored for the design-to-code conversion task, addressing the limitations of general-purpose LLMs and LMMs in this domain. While LLMs and LMMs have demonstrated progress in multimodal understanding, their architectures and training corpora are not optimized for the structural, hierarchical, and semantic nuances of design files. This results in suboptimal code generation, particularly in terms of visual fidelity, responsiveness, and maintainability. The authors identify key challenges: (1) lack of interpretability and reproducibility in generative code outputs, (2) insufficient semantic understanding of UI elements, and (3) inability to abstract repeated patterns into reusable components.

LDM Architecture and Training Pipeline

LDMs are trained exclusively on a large corpus of design files and web pages, leveraging multimodal data (design metadata, images, and text) to build rich, multi-view representations of UI structures. The architecture comprises four core models:

  • Design Optimizer: Utilizes an XGBoost-based framework trained on a proprietary, expert-annotated dataset to transform unstructured or suboptimal designs into best-practice, semantically grouped, and responsive layouts.
  • Tagging and Feature Detection: Employs a custom YOLO backbone (pretrained via the Jasmine strategy on over 1M UI-specific nodes) to detect and classify UI elements, capturing both atomic and composite structures with high precision.
  • Auto Components: Identifies and abstracts repeated UI patterns into reusable components, automatically inferring component properties and facilitating modular code generation.
  • Responsiveness: Ensures that generated code maintains layout integrity and interactivity across device form factors.

The training pipeline is highly data-centric, with extensive curation and annotation to ensure that the models internalize the visual and structural patterns unique to digital interfaces.

Deterministic Inference Pipeline

A key innovation is the deterministic inference pipeline, which decouples model outputs from direct code synthesis. Instead, LDMs generate structured, interpretable instructions that are consumed by a proprietary code generation engine. This guarantees that identical designs always yield identical code, supporting version control and professional development workflows. The pipeline processes real-world design files, applies the Design Optimizer, tags and groups UI elements, extracts components, and finally emits code generation instructions.

Evaluation Metrics and Empirical Results

The authors introduce the Preview Match Score (PMS), a node-level metric quantifying the spatial and dimensional fidelity between the original design and the rendered output. A node is considered a match if its position and size deviate by less than 3% from the original. Across 1,000 real-world designs, 89.6% of screens achieved a PMS exceeding 95%, indicating high-fidelity code generation.

Comparative experiments against state-of-the-art LLMs (Meta-LLaMA, Google T5, Gemini 1.5 Pro, ChatGPT-4o) reveal that LDMs maintain structural accuracy and hierarchy even in deeply nested designs, where LLMs frequently fail due to token limitations and misgrouping. Fine-tuning LLMs with multimodal data marginally improves performance but incurs prohibitive costs and still underperforms compared to LDMs.

In tagging and feature detection, the Jasmine-pretrained YOLO model achieves macro-average F1-scores of 86.07% (small tags) and 77.22% (large tags), substantially outperforming baseline ResNet and vanilla YOLO models. The Auto Components module demonstrates robust abstraction of repeated UI patterns, promoting code modularity and maintainability.

Implementation Considerations

Data Engineering and Annotation

  • High-quality, large-scale, and UI-specific datasets are critical. The proprietary annotation process, involving expert designers, is essential for training the Design Optimizer and Tagging models.
  • Pretraining on generic data is insufficient; domain-specific pretraining (as in Jasmine) is necessary for high precision in UI element detection.

Model Architecture

  • The use of XGBoost for design optimization is notable for its interpretability and efficiency in mapping unoptimized to optimized design states.
  • The custom YOLO backbone, with domain-specific pretraining and fine-tuning, is effective for spatially-aware feature detection.
  • Modularization via Auto Components requires robust clustering and similarity detection algorithms to identify repeated patterns across screens.

Inference and Code Generation

  • The separation of model inference from code generation ensures determinism and reproducibility, which are critical for professional development environments.
  • The system is designed to be extensible, allowing for integration with various frontend frameworks and design tools.

Resource and Scaling Considerations

  • LDMs are trained on hundreds of millions of parameters, which, while not at the scale of the largest LLMs, are sufficient for the domain-specific task.
  • The deterministic pipeline and efficient model architectures enable rapid inference (seconds per design), supporting real-time or near-real-time workflows.

Limitations and Future Directions

  • LDMs require well-structured input designs; free-form or AI-generated designs may yield suboptimal results and require manual intervention.
  • The current model size, while effective, does not reach the scale of the largest foundation models; ongoing work aims to scale both data and model size.
  • Further improvements are targeted for the Auto Components and Responsiveness modules, as well as broader support for additional design tools and frameworks.
  • Post-conversion refinement using code-specialized LLMs is proposed as a future enhancement.

Implications and Prospects

The LDM paradigm demonstrates that domain-specific, multimodal models can outperform general-purpose LLMs and LMMs in structured tasks such as design-to-code conversion. The deterministic, instruction-based inference pipeline addresses reproducibility and maintainability, which are often overlooked in generative approaches. The high fidelity and modularity of the generated code have direct implications for accelerating UI development, reducing manual engineering effort, and improving design consistency.

Theoretically, the work underscores the importance of domain adaptation and data-centric model development in applied AI. Practically, it sets a new standard for automated design-to-code systems, with potential extensions to other structured, multimodal translation tasks.

Conclusion

LDMs represent a significant advancement in automated design-to-code conversion, leveraging domain-specific training, multimodal data, and deterministic inference to achieve high-fidelity, modular, and reproducible code generation. The empirical results substantiate the superiority of LDMs over general-purpose LLMs and LMMs in this context. Future work will focus on scaling, broader tool integration, and further automation of the design-to-code pipeline, with the potential to generalize the approach to other structured multimodal domains.

Youtube Logo Streamline Icon: https://streamlinehq.com