Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 83 tok/s
Gemini 2.5 Flash 150 tok/s Pro
Gemini 2.5 Pro 48 tok/s Pro
Kimi K2 190 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

PSD2Code: Automated PSD-to-Code Conversion

Updated 13 November 2025
  • PSD2Code is an automated system that converts detailed PSD design files into high-fidelity, production-ready React and SCSS code using structured parsing and constraint-based alignment.
  • It employs a multi-phase Parse–Align–Generate pipeline to accurately translate hierarchical design semantics and asset information into modular, engineering-ready code.
  • By leveraging prompt-based multimodal LLMs, PSD2Code significantly improves visual quality and structural fidelity compared to traditional screenshot-based code generation methods.

PSD2Code is an automated system for converting Adobe Photoshop (PSD) design files into production-ready front-end code, with a particular focus on high-fidelity React and SCSS output. It leverages structured parsing of design files, hard constraint-based asset alignment, and prompt-based multimodal LLMs to address structural fidelity, visual quality, and engineering-readiness in generated code. PSD2Code represents a significant advancement over earlier screenshot-based approaches by integrating hierarchical design semantics and element-asset consistency directly into the code generation process (Chen et al., 6 Nov 2025).

1. Formal Problem Definition and Motivation

In the design-to-code generation paradigm, the input is a complex layered design artifact—typically a PSD file, represented as a tuple (P,A)(P, A), where PP encodes the hierarchical layer structure, geometric and style metadata, and AA is the directory of raster/vector assets. The conversion objective is to produce a text string CC—generally modularized into React JSX and SCSS—that, when rendered in a browser, visually and structurally matches the reference design. This can be formalized as seeking a mapping fθ:(P,A)Cf_\theta: (P, A) \rightarrow C, optimized so that the rendering engine’s output I^\hat{I} minimizes the visual discrepancy to the design screenshot II under metrics such as SSIM, PSNR, or structural block matching (Chen et al., 6 Nov 2025, Si et al., 5 Mar 2024).

Previous GUI-to-code systems (notably pix2code (Beltramelli, 2017)) operated on bitmap screenshots and a synthetic GUI DSL, oversimplifying hierarchy and omitting semantic grouping, asset referencing, or engineering constraints. PSD2Code addresses these shortcomings by direct PSD parsing and hard constraint injection into the generation pipeline, ensuring accurate spatial, hierarchical, and asset-level correspondence.

2. Parse–Align–Generate Pipeline

The core PSD2Code workflow follows a closed-loop Parse–Align–Generate (–Validate) pipeline:

2.1 Parse

  • Input: PSD file PP.
  • Procedures:
    • Containers: group layers with union coverage ratio r0.85r \geq 0.85 (where r=Areaunion/Areagroupr = \mathrm{Area}_\mathrm{union} / \mathrm{Area}_\mathrm{group}) and 2\leq 2 pixel candidates; otherwise retain as containers.
    • Text: short text layers (length 10\leq 10) matched by name.
    • Images: all remaining pixel layers.
    • 5. Prune empty containers, cap depth at MAX_DEPTH=6\mathrm{MAX\_DEPTH} = 6.
    • 6. Export a structured design.json recording all hierarchy, typed elements, bounding boxes, and asset links.
  • Spatial relationship representations:

IoU(i,j)=Area(BBiBBj)Area(BBiBBj)\mathrm{IoU}(i, j) = \frac{\mathrm{Area}(BB_i \cap BB_j)}{\mathrm{Area}(BB_i \cup BB_j)}

σx,σy\sigma_x, \sigma_y are used for grid/list structure inference, measuring alignment variance over NN siblings.

2.2 Align

  • Input: Parsed element set EE and asset directory AA.
  • Procedures:

    1. Extract actual (wa,ha)(w_a, h_a) for each asset aAa \in A.
    2. Match image-type elements ee to assets aa by name and minimal (wewa+heha)(|w_e - w_a| + |h_e - h_a|).
    3. Impose hard size equalities: e:e.type=image    (we=wa)(he=ha)\forall e: e.\mathrm{type}=\mathrm{image} \implies (w_e = w_a) \wedge (h_e = h_a).
    4. Validate and update asset references in the structure, guaranteeing one-to-one mapping.
  • Note: All constraints are enforced directly, with no iterative optimization, due to full metadata availability.

2.3 Generate

  • Input: Aligned design.json, asset list, and optional design screenshot.
  • Procedures:
    • System instruction to produce only JSX and SCSS dual blocks, enforce naming, and avoid commentary.
    • Design-to-code mapping example (for in-context learning).
    • User message embedding the exact element structure, full asset paths/sizes, and an explicit list of engineering hard constraints (absolute positioning, asset URLs, class naming).
    • 2. Inject hard constraints to prevent layout/scaling hallucination by the LLM.
    • 3. Invoke the LLM (e.g., GPT-4o) with temperature =0.7=0.7 and max tokens =4000=4000.
    • 4. Post-process generated code:
    • Verify JSX and SCSS for syntax.
    • Match asset imports and dimensions.
    • Optionally, render the output headlessly for visual validation.

3. Constraint-Based Alignment and Hierarchical Semantics

PSD2Code expresses all spatial, asset, and containment constraints as explicit linear equalities and inequalities, ensuring:

  • Asset dimensions in code exactly match ground-truth: e:e.type=image,we=wa,he=ha\forall e: e.\mathrm{type} = \mathrm{image}, w_e = w_a, h_e = h_a.
  • All coordinates satisfy page bounds: (0xeWwe),(0yeHhe)(0 \leq x_e \leq W - w_e), (0 \leq y_e \leq H - h_e).
  • Nested (parent–child) containment is guaranteed:

xpxcypycxc+wcxp+wpyc+hcyp+hpx_p \leq x_c \wedge y_p \leq y_c \wedge x_c+w_c \leq x_p+w_p \wedge y_c+h_c \leq y_p+h_p

All these constraints are checked and injected prior to LLM inference, so the generation phase is strictly constrained to the parsed design.

Hierarchical grouping and semantic class assignment flow directly from PSD’s layer/grouping structure and are preserved in the code output. Containers are mapped to React components, with children formed via component composition. Asset references, text extraction, and z-index ordering all follow the PSD semantics explicitly.

4. Structured Prompt Engineering for LLMs

PSD2Code leverages structured prompts to control LLM outputs deterministically. The standardized prompt structure consists of:

  1. System Instruction: Instructs the model to output only JSX and SCSS, using specified naming conventions and strictly following provided design.json and asset metadata.
  2. Example Block: Provides a minimal design-to-code example, bootstrapping the LLM with a canonical mapping.
  3. User Message: Includes the full design.json, explicit engineering constraints (absolute size/position, hierarchical order, asset URLs), and in some cases explicit file/folder targets (e.g., “Generate in src/components/…”).
  4. Constraint Labels: For each element, the prompt binds code to the parsed coordinates and enforces absolute/flex positioning with no free-form adaptation.

This methodology is critical; ablation studies show that omitting prompt engineering reduces SSIM by 51.8% and CodeBLEU by 9.7% (Chen et al., 6 Nov 2025).

5. Quantitative Evaluation and Model Generalization

Dataset and Baselines

  • 100 real-world PSD files (event pages, landing, popups), each paired with screenshot, asset directory, ground-truth React+SCSS, and parsed design.json (70/15/15 train/val/test split).
  • Compared against:
    • CodeFun commercial Figma plugin (React+SCSS)
    • Screenshot-to-code (GPT-4V, naive prompt)
    • pix2code (CNN-LSTM) (Beltramelli, 2017)

Evaluation Metrics

Metric Range Description
CodeBLEU [0,1][0,1] Reference-aware code similarity
CodeBERT Cosine Sim. [0,1][0,1] Embedding-based code similarity
SSIM [0,1][0,1] Structural Similarity Index Measure
PSNR dB Peak Signal-to-Noise Ratio
Executability [0,1]×100[0,1]\times 100 Fraction producing syntactically correct, renderable code
Resource Integration [0,1]×100[0,1]\times 100 Fraction with correct asset linking
Layout consistency mAP/APs/APm/APl Bounding box detection metrics
Statistical sig. pp values, Cohen’s dd Pairwise tests, effect size, CIs

Results (Test Set Averages)

Method CodeBLEU CodeBERT SSIM PSNR (dB) Gen. SR
CodeFun 0.691 0.691 0.691 30.2 100%
Screenshot-to-code (GPT-4V) 0.623 0.641 0.623 28.4 99.2%
pix2code 0.587 0.598 0.587 26.8 69.3%
PSD2Code 0.683 0.982 0.878 33.75 100%
  • PSD2Code achieves +53.2% CodeBERT and +40.9% SSIM over screenshot-to-code (p < 0.01 for PSNR).
  • Model-agnosticism: Across GPT-4o, Qwen-VL-Max, DeepSeek-VL, Gemini-2.5-Pro, variation in CodeBLEU and SSIM is within 5.2% and 8.9% respectively; all models achieve 100% executability (Chen et al., 6 Nov 2025).

Ablation Study Findings:

  • Removing PSD parsing causes –9.7% CodeBLEU and –51.8% SSIM.
  • Removing multimodal fusion or prompt engineering severely degrades visual and structural alignment.

pix2code (Beltramelli, 2017) introduced an end-to-end CNN-LSTM architecture for screenshot-to-code generation with a synthetic DSL. However, it does not address PSD semantics, asset extraction, or hierarchical structure, and does not implement hard engineering constraints. The paper notes that adapting pix2code to PSD would require substantial changes in input preprocessing, CNN capacity, DSL expressiveness, and possibly attention mechanisms.

Prototype2Code (Xiao et al., 8 May 2024) applies design linting, hierarchical structure optimization via GNNs, and responsive code via flexbox layouts, yielding higher visual similarity and maintainability scores than commercial tools or vision-only LLM prompting. PSD2Code shares key design elements, such as hierarchical parsing, deterministic layout construction, and LLM-driven style completion, but uniquely enforces hard alignment between layers and code via constraint injection.

Design2Code (Si et al., 5 Mar 2024) provides a large-scale real-world benchmarking of screenshot-to-code. Key takeaways relevant for PSD2Code include:

  • Extraction of text and bounding boxes prior to generation significantly boosts recall.
  • Iterative “self-revision” (render → critique → regenerate) further improves layout and block-level correctness.
  • Automated visual/block/text metrics, combined with human-in-the-loop review, provide a robust evaluation pipeline.

A plausible implication is that future extensions of PSD2Code may benefit from integrating iterative feedback loops, mixed synthetic and real data for fine-tuning, and richer vector layer (shape, gradient) processing for color/style fidelity.

7. Limitations and Directions for Further Research

Limitations

  • Dataset scope is restricted to 100 PSD files focused on mobile/PC domains; this limits observed generalization to tablet/desktop and highly customized designs.
  • The PSD parser may not robustly process smart objects, deeply nested masks, or exotic layer effects.
  • Generated code is static; there is no support for responsive/adaptive layouts (media queries, Flexbox, CSS Grid) or dynamic behaviors (hover, animation, event handlers).
  • Evaluation omits accessibility metrics, performance profiling, and cross-browser compatibility.

Future Directions

  • Parsing for additional design formats (Figma, Sketch, Adobe XD) and supporting responsive layout synthesis.
  • Automated generation of dynamic interaction logic and integration of accessibility checks (e.g., WCAG compliance).
  • Expanding the dataset for broader architectural diversity and conducting supervised/fine-tuned LLM training on PSD–code pairs.
  • Developing a “Validate” loop with vision-based code critics for auto-correcting misalignments via differentiable rendering.
  • Two-stage skeleton–style generation: first emitting clean layout scaffolding, then LLM-driven style augmentation, as suggested by “Design2Code” (Si et al., 5 Mar 2024).

Summary Table: Key Technical Dimensions

Dimension PSD2Code pix2code Prototype2Code
Input PSD w/ layer tree/metadata RGB Screenshot Figma, Sketch prototype
Output React + SCSS Custom DSL HTML + CSS
Architecture Parse–Align–Generate + LLM CNN-LSTM GNN + object detectors + LLM
Constraint Type Hard equalities/inequalities None Deterministic tree/Flexbox
Evaluation CodeBLEU, SSIM, PSNR, etc. Token error rate SSIM, PSNR, user studies

PSD2Code sets a new standard in design-to-code generation by explicitly aligning PSD structure and assets to code via formal constraints, leveraging structured LLM prompting for deterministic, high-fidelity, modular front-end output, and demonstrating model-agnostic performance over multiple state-of-the-art MLLMs (Chen et al., 6 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to PSD2Code.