- The paper presents LaTCoder that decomposes webpage designs into manageable blocks to generate HTML/CSS code with improved layout fidelity via chain-of-thought reasoning.
- Using a custom division algorithm, LaTCoder segments designs along solid-color lines to mimic CSS box model structure, achieving up to 66.67% improvements in TreeBLEU scores.
- Human evaluations show that LaTCoder outperforms existing methods by preserving detailed design layouts and reducing MAE by 38% in complex real-world benchmarks.
LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
The paper "LaTCoder: Converting Webpage Design to Code with Layout-as-Thought" introduces a novel method for translating webpage designs into HTML/CSS code while preserving layout integrity. This approach exploits the Chain-of-Thought (CoT) reasoning concept, applying it to layout understanding to ensure that Multimodal LLMs (MLLMs) maintain fidelity to the design's original layout.
Motivation and Approach
The translation of visual designs into code is notoriously inaccurate in preserving layout using existing MLLMs. The authors identify that MLLMs tend to lose layout information, especially in complex, real-world webpage designs. Through Layout-as-Thought (LaT), LaTCoder aims to mitigate these deficiencies by decomposing the page into manageable segments and leveraging CoT-based prompts to guide MLLMs in accurate code generation.
Figure 1: A real-world bad case from the famous project, screenshot-to-code~\protect\cite{homepage_screen2shot}.
LaTCoder's workflow consists of dividing the webpage design into separate image blocks, generating code for each block using CoT strategies, and then assembling these blocks into complete HTML/CSS webpages.
Figure 2: The workflow of LaTCoder.
Design-to-Code Task and Solution
The task aims to map a webpage design into HTML/CSS with minimal visual discrepancy. The LaTCoder approach is broken down into:
- Layout-Aware Division: Using a custom algorithm to detect dividing lines in the design to split it into simpler subregions. This method respects the CSS box model for structured layout handling.
- Block-Wise Code Synthesis: Each block is processed individually, employing MLLMs under a CoT-based prompting structure to maintain layout and style consistency.
- Layout-Preserved Assembly: Two strategies—absolute positioning and MLLM-based assembly—ensure that the complete webpage maintains the required fidelity to the original design.
Figure 3: A toy example of dividing line detection.
The division method efficiently identifies solid-colored lines to segment designs, focusing on ease of computation while respecting the original layout’s structure.
Figure 4: The simplified prompt for generating image block code (the full version is shown in Figure~\ref{fig_prompt_generate} in Appendix).
Evaluation and Results
Experimental validation includes the introduction of the CC-HARD benchmark, which comprises more complex layouts to challenge the model better. Performance improvements are evident across metrics such as TreeBLEU and MAE, indicating superior structural and visual similarities in output when compared to baseline methods.
LaTCoder substantially improves TreeBLEU scores by up to 66.67% and reduces MAE by 38%, demonstrating significant advancements over existing methods in preserving layout during code generation.
Human Evaluation
Pairwise comparison and human preference evaluation further reinforce the effectiveness of LaTCoder, with human annotators demonstrating a clear preference for its outputs over other methods. This preference is largely attributed to its superior ability to maintain design layout integrity and detail accuracy.
Figure 5: Pairwise human preference evaluation of baseline methods relative to LaTCoder.
Figure 6: Case paper of samples generated by LaTCoder and other baseline methods with GPT-4o as the backbone MLLM: LaTCoder significantly outperforms the others, particularly in preserving the layout of the design.
Conclusion
LaTCoder introduces a significant improvement in the design-to-code task by focusing on breaking down webpage elements and leveraging MLLMs under the Layout-as-Thought paradigm. This method affords better preservation of the original design's structure, offering a viable solution for automated UI synthesis tasks. Future research could explore further optimizations in dividing algorithms and assembly strategies, enhancing MLLM capabilities for broader real-world application scenarios.