UI2Codeⁿ: UI Generation & Narayana Coding
- UI2Codeⁿ is a dual-term: one part is a cutting-edge visual language model for interactive UI-to-code synthesis with iterative feedback and scalability.
- The model employs a multimodal Transformer that fuses visual encoding and language decoding using pre-training, supervised fine-tuning, and reinforcement learning.
- Separately, UI2Codeⁿ defines a Narayana-based universal integer code that guarantees unique, prefix-free representations with logarithmic growth and proven optimality.
UI2Code is a unified term designating two rigorously specified, entirely unrelated schemes: a contemporary visual LLM for interactive UI-to-code generation (Yang et al., 11 Nov 2025), and a universal binary integer code governed by the Narayana sequence (Kirthi et al., 2016). The dual usage reflects both modern AI system design and mathematical coding theory, each with distinct formalisms, objectives, and operational mechanics. Only the connection in nomenclature links these topics.
1. Visual LLM for Interactive UI-to-Code Generation
The first and currently prominent usage of UI2Code designates an open-source visual LLM (VLM) architecture tailored for automatic, interactive, and test-time scalable user interface generation from rendered screenshots. UI2Code introduces foundational advances in multimodal UI coding by unifying three core capabilities: direct UI-to-code synthesis, UI editing via natural language instruction, and iterative UI code polishing. This system explicitly addresses the limitations of underdeveloped multimodal reasoning and non-interactive paradigms found in prior work.
1.1 Model Architecture
UI2Code utilizes a multimodal encoder–decoder Transformer with the following building blocks:
- Visual Encoder: A ViT-style backbone employing patch embedding, positional encoding, and self-attention, operating on UI screenshots or renders.
- Language Decoder: A causal Transformer generating structured code tokens (typically HTML, CSS, JS).
- Cross-modal Fusion: Decoder layers incorporate cross-attention to visual encoder outputs:
where are the decoder queries, are keys/values.
Inputs and decoding prompts distinguish among three tasks:
- UI-to-code: Screenshot code, with special > …<answer>…</answer> scaffolding.
- UI polishing: Target screenshot, initial code tokens, and the rendered output refined code.
- UI editing: Reference screenshot, prior code, and text instruction revised code.
The same model core is reused, altering only modality subcomponents and task-specific input templates.
1.2 Training Methodology
The UI2Code training pipeline consists of three stages:
- Continual Pre-training: Approximately real webpage pairs (screenshot + HTML) and synthetic pairs are used, interleaved with general VLM objectives (captioning, VQA, OCR, video QA). Primary objectives are:
- Masked Language Modeling (MLM) on code tokens:
GUI referring, predicting HTML spans from screenshots and DOM locations:
Image–Text Contrastive Learning:
Supervised Fine-tuning (SFT): Uses $80,000$ curated examples with scaffolding for all three tasks; optimizes cross-entropy on canonical target outputs for each task.
Reinforcement Learning (RL): Policy gradient (GRPO) on $12,000$ real + $30,000$ synthetic rollouts. Reward signals are derived from a VLM-based verifier (GLM-4.5V), scoring image similarity. Enhanced by a human-aligned “comparator” and round-robin ranking:
1.3 Multi-Turn Interactive Workflow
A hallmark of UI2Code is test-time scaling, leveraging iterative visual feedback:
1 2 3 4 5 6 7 8 |
procedure INTERACTIVE_UI2CODE(I_image, N_rounds):
C⁰ ← GenerateCode(I_image)
R⁰ ← Render(C⁰)
for t in 1..N_rounds−1:
Input ← (I_image, Cᵗ⁻¹, Rᵗ⁻¹)
Cᵗ ← PolishingModel(Input)
Rᵗ ← Render(Cᵗ)
return C^{N_rounds−1}, R^{N_rounds−1} |
2. Evaluation and Empirical Performance
Evaluation on several public and proprietary UI-to-code and UI-polishing benchmarks demonstrates state-of-the-art open-source performance and competitiveness with leading closed-source models (Claude-4-Sonnet, GPT-5).
| Model | Design2 Acc | Flame Acc | Web2 Acc | UI-Polish Synth | UI-Polish Real |
|---|---|---|---|---|---|
| InternVL3-78B | 30.0 | 51.3 | 45.5 | 15% | 10% |
| Qwen2.5-VL-72B | 41.9 | 46.3 | 64.1 | 38% | 23% |
| GLM-4.1V-9B | 64.7 | 72.5 | 71.3 | 46% | 42% |
| Claude-4-Sonnet | 81.2 | 76.3 | 85.1 | 65% | 78% |
| Gemini-2.5-Pro | 89.5 | 87.5 | 90.6 | 68% | 74% |
| GPT-5 | 89.7 | 91.3 | 93.7 | 68% | 85% |
| UI2Code-9B-RL | 88.6 | 95.0 | 92.5 | 94% | 80% |
Underlined values = best overall; bold = best open-source.
Test-time scaling reveals that UI2Code achieves further gains as polish rounds increase (real data: 66% [1], 68% [2], 70% [3], 73% [4], 74% [5]), and ablation shows RL reward tuning and real data have strong positive effects on real-world benchmarks.
3. Practical Applications and Limitations
UI2Code is immediately applicable to automated code generation from UI screens, iterative UI refinement, and natural language-driven editing of preexisting UIs. Representative qualitative behaviors include:
Correction of style-level errors (e.g., button padding, typographical exactness after polishing).
Restoration of complex layouts initially misrepresented in draft code.
Language-driven edits, such as changing navigation bar color or repositioning UI elements by code injection.
Limitations include inability to handle dynamic/interactive JavaScript widgets (such as carousels and modals), code truncation on inputs exceeding 32k tokens, and occasional pixel-level inaccuracies in sub-pixel rendering scenarios.
4. Universal Coding with the Narayana Sequence
The second sense of UI2Code refers to a mathematically rigorous, prefix-free, universal integer code with codeword lengths and enumeration determined by the Narayana sequence (Kirthi et al., 2016).
4.1 Narayana Sequence and Coding Basis
The classical Narayana sequence is recursively defined by . Shifting by two gives a basis sequence with , i.e. , and .
The dominant Narayana ratio satisfies and controls the exponential growth rate of .
4.2 Encoding and Decoding Procedures
The UI2Code integer code is based on a Zeckendorf-type, nonconsecutive sum expansion in the -basis.
Encoding
For integer :
Compute maximal with .
Find binary such that:
- for .
- The codeword .
Decoding
Given binary string (ending in ‘1’):
- Discard terminating ‘1’, let length .
- Set , and compute .
- .
Both representations are unique because no two consecutive ‘1’s are present, and all codewords end with the signature “…11” that cannot appear elsewhere.
4.3 Theoretical Properties
- Prefix Property: No codeword is a prefix of another due to unique “…11” ending and the no-consecutive-1s constraint.
- Universality: Every can be represented (see Theorem 2 in (Kirthi et al., 2016)).
- Length formula: with .
- Asymptotics: , so growth is logarithmic, with base .
- Enumeration: Number of codewords of length is .
- Redundancy: Tends to zero; the scheme encodes every deterministically and prefix-free without prior distributional knowledge.
- Optimality: Alternative bases (e.g., shifted Narayana) either omit some or lose uniqueness.
4.4 Algorithmic Realizations and Examples
Encoding pseudocode:
1 2 3 4 5 6 7 8 9 10 11 |
function ENCODE_UI2Coden(n):
Compute J_i until J_d <= n < J_{d+1}
r ← n
for i = d downto 0:
if J_i <= r:
B_i ← 1
r ← r - J_i
else:
B_i ← 0
c ← concat(B_0 ... B_d, '1')
return c |
Examples:
| -expansion | Vector | Codeword | |
|---|---|---|---|
| 1 | (1) | 11 | |
| 2 | (0, 1) | 0 1 1 | |
| 3 | (0, 0, 1) | 0 0 1 1 | |
| 5 | (1,0,0,1) | 1 0 0 1 1 |
4.5 Comparative Remarks
The UI2Code code grows faster than codes based on Fibonacci sequences but more slowly than binary, with an exponent determined by the Narayana ratio. It is a “universal” code in the Elias sense.
5. Significance and Context
The VLM UI2Code sets a new state-of-the-art in open-source visual UI coding, matched closely only by proprietary VLMs. The test-time scalable, multi-turn interaction paradigm allows for systematic improvements leveraging visual feedback, closing the gap between automated and human-in-the-loop design workflows. The code was released publicly, enabling further research.
The UI2Code universal integer code constitutes an independent advance in prefix coding theory, extending the family of Fibonacci- and Lucas-based codes by employing the Narayana basis. Its optimality is tied to the combinatorial properties of the sequence, and no alternative shift or variant achieves the same universality and uniqueness guarantees.
A plausible implication is that the dual manifestation of the UI2Code term—concurrent in AI systems and in coding theory—reflects the broader interplay between semantic representation learning and rigorous symbolic codification. However, there is currently no technical connection beyond naming and adherence to strict formalism.