Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 109 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 468 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought (2508.03560v1)

Published 5 Aug 2025 in cs.SE

Abstract: Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal LLMs (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents LaTCoder that decomposes webpage designs into manageable blocks to generate HTML/CSS code with improved layout fidelity via chain-of-thought reasoning.
  • Using a custom division algorithm, LaTCoder segments designs along solid-color lines to mimic CSS box model structure, achieving up to 66.67% improvements in TreeBLEU scores.
  • Human evaluations show that LaTCoder outperforms existing methods by preserving detailed design layouts and reducing MAE by 38% in complex real-world benchmarks.

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought

The paper "LaTCoder: Converting Webpage Design to Code with Layout-as-Thought" introduces a novel method for translating webpage designs into HTML/CSS code while preserving layout integrity. This approach exploits the Chain-of-Thought (CoT) reasoning concept, applying it to layout understanding to ensure that Multimodal LLMs (MLLMs) maintain fidelity to the design's original layout.

Motivation and Approach

The translation of visual designs into code is notoriously inaccurate in preserving layout using existing MLLMs. The authors identify that MLLMs tend to lose layout information, especially in complex, real-world webpage designs. Through Layout-as-Thought (LaT), LaTCoder aims to mitigate these deficiencies by decomposing the page into manageable segments and leveraging CoT-based prompts to guide MLLMs in accurate code generation. Figure 1

Figure 1: A real-world bad case from the famous project, screenshot-to-code~\protect\cite{homepage_screen2shot}.

LaTCoder's workflow consists of dividing the webpage design into separate image blocks, generating code for each block using CoT strategies, and then assembling these blocks into complete HTML/CSS webpages. Figure 2

Figure 2: The workflow of LaTCoder.

Design-to-Code Task and Solution

The task aims to map a webpage design into HTML/CSS with minimal visual discrepancy. The LaTCoder approach is broken down into:

  1. Layout-Aware Division: Using a custom algorithm to detect dividing lines in the design to split it into simpler subregions. This method respects the CSS box model for structured layout handling.
  2. Block-Wise Code Synthesis: Each block is processed individually, employing MLLMs under a CoT-based prompting structure to maintain layout and style consistency.
  3. Layout-Preserved Assembly: Two strategies—absolute positioning and MLLM-based assembly—ensure that the complete webpage maintains the required fidelity to the original design. Figure 3

    Figure 3: A toy example of dividing line detection.

The division method efficiently identifies solid-colored lines to segment designs, focusing on ease of computation while respecting the original layout’s structure. Figure 4

Figure 4: The simplified prompt for generating image block code (the full version is shown in Figure~\ref{fig_prompt_generate} in Appendix).

Evaluation and Results

Experimental validation includes the introduction of the CC-HARD benchmark, which comprises more complex layouts to challenge the model better. Performance improvements are evident across metrics such as TreeBLEU and MAE, indicating superior structural and visual similarities in output when compared to baseline methods.

LaTCoder substantially improves TreeBLEU scores by up to 66.67% and reduces MAE by 38%, demonstrating significant advancements over existing methods in preserving layout during code generation.

Human Evaluation

Pairwise comparison and human preference evaluation further reinforce the effectiveness of LaTCoder, with human annotators demonstrating a clear preference for its outputs over other methods. This preference is largely attributed to its superior ability to maintain design layout integrity and detail accuracy. Figure 5

Figure 5: Pairwise human preference evaluation of baseline methods relative to LaTCoder.

Figure 6

Figure 6: Case paper of samples generated by LaTCoder and other baseline methods with GPT-4o as the backbone MLLM: LaTCoder significantly outperforms the others, particularly in preserving the layout of the design.

Conclusion

LaTCoder introduces a significant improvement in the design-to-code task by focusing on breaking down webpage elements and leveraging MLLMs under the Layout-as-Thought paradigm. This method affords better preservation of the original design's structure, offering a viable solution for automated UI synthesis tasks. Future research could explore further optimizations in dividing algorithms and assembly strategies, enhancing MLLM capabilities for broader real-world application scenarios.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com