PSD2Code: Automated PSD-to-Code Conversion

Updated 13 November 2025

PSD2Code is an automated system that converts detailed PSD design files into high-fidelity, production-ready React and SCSS code using structured parsing and constraint-based alignment.
It employs a multi-phase Parse–Align–Generate pipeline to accurately translate hierarchical design semantics and asset information into modular, engineering-ready code.
By leveraging prompt-based multimodal LLMs, PSD2Code significantly improves visual quality and structural fidelity compared to traditional screenshot-based code generation methods.

PSD2Code is an automated system for converting Adobe Photoshop (PSD) design files into production-ready front-end code, with a particular focus on high-fidelity React and SCSS output. It leverages structured parsing of design files, hard constraint-based asset alignment, and prompt-based multimodal LLMs to address structural fidelity, visual quality, and engineering-readiness in generated code. PSD2Code represents a significant advancement over earlier screenshot-based approaches by integrating hierarchical design semantics and element-asset consistency directly into the code generation process (Chen et al., 6 Nov 2025).

1. Formal Problem Definition and Motivation

In the design-to-code generation paradigm, the input is a complex layered design artifact—typically a PSD file, represented as a tuple $(P, A)$ , where $P$ encodes the hierarchical layer structure, geometric and style metadata, and $A$ is the directory of raster/vector assets. The conversion objective is to produce a text string $C$ —generally modularized into React JSX and SCSS—that, when rendered in a browser, visually and structurally matches the reference design. This can be formalized as seeking a mapping $f_\theta: (P, A) \rightarrow C$ , optimized so that the rendering engine’s output $\hat{I}$ minimizes the visual discrepancy to the design screenshot $I$ under metrics such as SSIM, PSNR, or structural block matching (Chen et al., 6 Nov 2025, Si et al., 2024).

Previous GUI-to-code systems (notably pix2code (Beltramelli, 2017)) operated on bitmap screenshots and a synthetic GUI DSL, oversimplifying hierarchy and omitting semantic grouping, asset referencing, or engineering constraints. PSD2Code addresses these shortcomings by direct PSD parsing and hard constraint injection into the generation pipeline, ensuring accurate spatial, hierarchical, and asset-level correspondence.

2. Parse–Align–Generate Pipeline

The core PSD2Code workflow follows a closed-loop Parse–Align–Generate (–Validate) pipeline:

2.1 Parse

Input: PSD file $P$ .
Procedures:
- Containers: group layers with union coverage ratio $r \geq 0.85$ (where $r = \mathrm{Area}_\mathrm{union} / \mathrm{Area}_\mathrm{group}$ ) and $\leq 2$ pixel candidates; otherwise retain as containers.
- Text: short text layers (length $\leq 10$ ) matched by name.
- Images: all remaining pixel layers.
- 5. Prune empty containers, cap depth at $\mathrm{MAX\_DEPTH} = 6$ .
- 6. Export a structured design.json recording all hierarchy, typed elements, bounding boxes, and asset links.
Spatial relationship representations:

$\mathrm{IoU}(i, j) = \frac{\mathrm{Area}(BB_i \cap BB_j)}{\mathrm{Area}(BB_i \cup BB_j)}$

$\sigma_x, \sigma_y$ are used for grid/list structure inference, measuring alignment variance over $N$ siblings.

2.2 Align

Input: Parsed element set $E$ and asset directory $A$ .
Procedures:
1. Extract actual $(w_a, h_a)$ for each asset $a \in A$ .
2. Match image-type elements $e$ to assets $a$ by name and minimal $(|w_e - w_a| + |h_e - h_a|)$ .
3. Impose hard size equalities: $\forall e: e.\mathrm{type}=\mathrm{image} \implies (w_e = w_a) \wedge (h_e = h_a)$ .
4. Validate and update asset references in the structure, guaranteeing one-to-one mapping.
Note: All constraints are enforced directly, with no iterative optimization, due to full metadata availability.

2.3 Generate

Input: Aligned design.json, asset list, and optional design screenshot.
Procedures:
- System instruction to produce only JSX and SCSS dual blocks, enforce naming, and avoid commentary.
- Design-to-code mapping example (for in-context learning).
- User message embedding the exact element structure, full asset paths/sizes, and an explicit list of engineering hard constraints (absolute positioning, asset URLs, class naming).
- 2. Inject hard constraints to prevent layout/scaling hallucination by the LLM.
- 3. Invoke the LLM (e.g., GPT-4o) with temperature $=0.7$ and max tokens $=4000$ .
- 4. Post-process generated code:
- Verify JSX and SCSS for syntax.
- Match asset imports and dimensions.
- Optionally, render the output headlessly for visual validation.

3. Constraint-Based Alignment and Hierarchical Semantics

PSD2Code expresses all spatial, asset, and containment constraints as explicit linear equalities and inequalities, ensuring:

Asset dimensions in code exactly match ground-truth: $\forall e: e.\mathrm{type} = \mathrm{image}, w_e = w_a, h_e = h_a$ .
All coordinates satisfy page bounds: $(0 \leq x_e \leq W - w_e), (0 \leq y_e \leq H - h_e)$ .
Nested (parent–child) containment is guaranteed:

$x_p \leq x_c \wedge y_p \leq y_c \wedge x_c+w_c \leq x_p+w_p \wedge y_c+h_c \leq y_p+h_p$

All these constraints are checked and injected prior to LLM inference, so the generation phase is strictly constrained to the parsed design.

Hierarchical grouping and semantic class assignment flow directly from PSD’s layer/grouping structure and are preserved in the code output. Containers are mapped to React components, with children formed via component composition. Asset references, text extraction, and z-index ordering all follow the PSD semantics explicitly.

4. Structured Prompt Engineering for LLMs

PSD2Code leverages structured prompts to control LLM outputs deterministically. The standardized prompt structure consists of:

System Instruction: Instructs the model to output only JSX and SCSS, using specified naming conventions and strictly following provided design.json and asset metadata.
Example Block: Provides a minimal design-to-code example, bootstrapping the LLM with a canonical mapping.
User Message: Includes the full design.json, explicit engineering constraints (absolute size/position, hierarchical order, asset URLs), and in some cases explicit file/folder targets (e.g., “Generate in src/components/…”).
Constraint Labels: For each element, the prompt binds code to the parsed coordinates and enforces absolute/flex positioning with no free-form adaptation.

This methodology is critical; ablation studies show that omitting prompt engineering reduces SSIM by 51.8% and CodeBLEU by 9.7% (Chen et al., 6 Nov 2025).

5. Quantitative Evaluation and Model Generalization

Dataset and Baselines

100 real-world PSD files (event pages, landing, popups), each paired with screenshot, asset directory, ground-truth React+SCSS, and parsed design.json (70/15/15 train/val/test split).
Compared against:
- CodeFun commercial Figma plugin (React+SCSS)
- Screenshot-to-code (GPT-4V, naive prompt)
- pix2code (CNN-LSTM) (Beltramelli, 2017)

Evaluation Metrics

Metric	Range	Description
CodeBLEU	$[0,1]$	Reference-aware code similarity
CodeBERT Cosine Sim.	$[0,1]$	Embedding-based code similarity
SSIM	$[0,1]$	Structural Similarity Index Measure
PSNR	dB	Peak Signal-to-Noise Ratio
Executability	$[0,1]\times 100$	Fraction producing syntactically correct, renderable code
Resource Integration	$[0,1]\times 100$	Fraction with correct asset linking
Layout consistency	mAP/APs/APm/APl	Bounding box detection metrics
Statistical sig.	$p$ values, Cohen’s $d$	Pairwise tests, effect size, CIs

Results (Test Set Averages)

Method	CodeBLEU	CodeBERT	SSIM	PSNR (dB)	Gen. SR
CodeFun	0.691	0.691	0.691	30.2	100%
Screenshot-to-code (GPT-4V)	0.623	0.641	0.623	28.4	99.2%
pix2code	0.587	0.598	0.587	26.8	69.3%
PSD2Code	0.683	0.982	0.878	33.75	100%

PSD2Code achieves +53.2% CodeBERT and +40.9% SSIM over screenshot-to-code (p < 0.01 for PSNR).
Model-agnosticism: Across GPT-4o, Qwen-VL-Max, DeepSeek-VL, Gemini-2.5-Pro, variation in CodeBLEU and SSIM is within 5.2% and 8.9% respectively; all models achieve 100% executability (Chen et al., 6 Nov 2025).

Ablation Study Findings:

Removing PSD parsing causes –9.7% CodeBLEU and –51.8% SSIM.
Removing multimodal fusion or prompt engineering severely degrades visual and structural alignment.

pix2code (Beltramelli, 2017) introduced an end-to-end CNN-LSTM architecture for screenshot-to-code generation with a synthetic DSL. However, it does not address PSD semantics, asset extraction, or hierarchical structure, and does not implement hard engineering constraints. The paper notes that adapting pix2code to PSD would require substantial changes in input preprocessing, CNN capacity, DSL expressiveness, and possibly attention mechanisms.

Prototype2Code (Xiao et al., 2024) applies design linting, hierarchical structure optimization via GNNs, and responsive code via flexbox layouts, yielding higher visual similarity and maintainability scores than commercial tools or vision-only LLM prompting. PSD2Code shares key design elements, such as hierarchical parsing, deterministic layout construction, and LLM-driven style completion, but uniquely enforces hard alignment between layers and code via constraint injection.

Design2Code (Si et al., 2024) provides a large-scale real-world benchmarking of screenshot-to-code. Key takeaways relevant for PSD2Code include:

Extraction of text and bounding boxes prior to generation significantly boosts recall.
Iterative “self-revision” (render → critique → regenerate) further improves layout and block-level correctness.
Automated visual/block/text metrics, combined with human-in-the-loop review, provide a robust evaluation pipeline.

A plausible implication is that future extensions of PSD2Code may benefit from integrating iterative feedback loops, mixed synthetic and real data for fine-tuning, and richer vector layer (shape, gradient) processing for color/style fidelity.

7. Limitations and Directions for Further Research

Limitations

Dataset scope is restricted to 100 PSD files focused on mobile/PC domains; this limits observed generalization to tablet/desktop and highly customized designs.
The PSD parser may not robustly process smart objects, deeply nested masks, or exotic layer effects.
Generated code is static; there is no support for responsive/adaptive layouts (media queries, Flexbox, CSS Grid) or dynamic behaviors (hover, animation, event handlers).
Evaluation omits accessibility metrics, performance profiling, and cross-browser compatibility.

Future Directions

Parsing for additional design formats (Figma, Sketch, Adobe XD) and supporting responsive layout synthesis.
Automated generation of dynamic interaction logic and integration of accessibility checks (e.g., WCAG compliance).
Expanding the dataset for broader architectural diversity and conducting supervised/fine-tuned LLM training on PSD–code pairs.
Developing a “Validate” loop with vision-based code critics for auto-correcting misalignments via differentiable rendering.
Two-stage skeleton–style generation: first emitting clean layout scaffolding, then LLM-driven style augmentation, as suggested by “Design2Code” (Si et al., 2024).

Summary Table: Key Technical Dimensions

Dimension	PSD2Code	pix2code	Prototype2Code
Input	PSD w/ layer tree/metadata	RGB Screenshot	Figma, Sketch prototype
Output	React + SCSS	Custom DSL	HTML + CSS
Architecture	Parse–Align–Generate + LLM	CNN-LSTM	GNN + object detectors + LLM
Constraint Type	Hard equalities/inequalities	None	Deterministic tree/Flexbox
Evaluation	CodeBLEU, SSIM, PSNR, etc.	Token error rate	SSIM, PSNR, user studies

PSD2Code sets a new standard in design-to-code generation by explicitly aligning PSD structure and assets to code via formal constraints, leveraging structured LLM prompting for deterministic, high-fidelity, modular front-end output, and demonstrating model-agnostic performance over multiple state-of-the-art MLLMs (Chen et al., 6 Nov 2025).