GramTrans: Optimizing Code Representations

Updated 7 October 2025

GramTrans is a method that automatically restructures code grammars into the LL(1) class, simplifying parsing for neural code generation.
It employs a hierarchical conflict elimination algorithm that expands productions and resolves ambiguities by injecting minimal distinguishing tokens.
Empirical results show that LL(1) representations achieve up to an 82% pass@1 score, outperforming more complex formats in both accuracy and efficiency.

GramTrans is an approach for optimizing code representations in neural code generation and synthesis. It is motivated by the conjecture that the easier a code representation is to parse, as measured by its formal grammar class, the better performance is achieved by code generation models. GramTrans provides an automatic method for transforming any context-free grammar into a form that resides in the LL(1) class, which greatly facilitates parsing and, by extension, improves both code generation accuracy and efficiency.

1. Formalization of Code Representation and Parsing Difficulty

Code representation selection is crucial in neural code generation, as the chosen format directly affects a model's capacity to infer syntax and structure. Existing code representations fall into four principal categories:

Plain Text: Source code strings with minimal explicit syntactic marking.
Grammar-Rule–Based Sequences: Lists of grammar rule applications derived from parsing or traversal of the abstract syntax tree (AST).
Syntax-Tree Traversal (SBT, etc.): Encoded traversals that mark entering and exiting non-terminals, with varying bracket schemes.
Special Compact DSLs (e.g., SimPy): Tokenizations or encodings that condense syntax while retaining program semantics.

GramTrans analyzes these formats under the lens of formal language theory. It leverages the LL(k)/LR(k)/NCFG hierarchy, wherein LL(1) grammars are distinguished as the simplest; they permit single-token lookahead parsing without ambiguity or backtracking. The paper proposes that representations in the LL(1) class are intrinsically easier for neural models to decode, as parsing decisions encode minimal ambiguity and maximal uniqueness per token. Empirically, this is measured via the pass@1 metric:

$\mathrm{pass}@1 = \frac{\text{Number of correct outputs}}{\text{Total samples}}$

Controlled experiments revealed that representations in LL(1) grammar classes led to higher pass@1 scores than those in more complex classes (LL(2), LR(1), or NCFG).

2. Hierarchical Conflict Elimination Algorithm

GramTrans introduces a hierarchical algorithm that transforms a general context-free grammar into an LL(1) grammar by systematically resolving parsing conflicts. The algorithm operates as follows:

Expansion: Production rules are expanded recursively (up to a set depth) to expose leading tokens and identify ambiguities.
Conflict Detection: Conflicts are recognized when multiple productions for the same non-terminal begin with the same terminal, or when left recursion is present.
Conflict Resolution: New distinguishing tokens are injected as prefixes into conflicting productions. When multiple overlapping conflicts occur, GramTrans formulates a minimum hitting set problem to add the minimal set of symbols necessary.
Symbol Reordering: After all conflicts are handled, the grammar is further optimized by reordering and eliminating redundant symbols, all while retaining LL(1) properties.

A representative transformation for a non-terminal $A$ with ambiguous leading symbols:

$A \to \left\{ \begin{array}{l} \langle\mathrm{call}\rangle\,\ldots \ \langle\mathrm{attribute}\rangle\,\ldots \end{array} \right.$

This ensures that each lookahead token is unique per production, satisfying LL(1) constraints.

3. Quantitative Evaluation and Results

GramTrans was evaluated on both synthetic and real-world code generation tasks:

MathQA DSL: Four code representations with increasing parsing difficulty (LL(1), LL(2), LR(1), NCFG) were benchmarked. GramTrans's LL(1) version achieved the highest pass@1 (82%) compared to the hardest class (80.41% for NCFG).
Python and Java Code Generation: GramTrans was applied to full-scale grammars. Python_LL(1), Python_1-layer (partial LL(1) transformation), plain text, grammar-rule–based, SimPy, and SBT representations were compared. On StarCoder 1B, Python_LL(1) achieved approximately 66.4% pass@1 versus 62.2% for plain text; the 1-layer variant delivered strong performance with minimal token bloat.
Generality: GramTrans's algorithm was successfully applied to Java (on HumanEval-X), yielding similar improvements. Statistical tests (Welch’s t-test, Wilcoxon signed-rank) validated that LL(1) representations produced by GramTrans significantly outperform less structured representations ( $p \approx 10^{-6}$ ).

4. Trade-Offs: Syntactic Simplicity vs. Token Efficiency

Transforming a grammar to LL(1) may introduce new distinguishing tokens (prefixes), slightly increasing sequence length. GramTrans allows regulated control:

Full LL(1): Maximum parsing simplicity, potentially more tokens.
One-Layer (1-layer): Resolves conflicts only at the first expansion layer, striking a balance between ease of parsing and token economy.

Experimental data shows that 1-layer GramTrans representations usually provide optimal trade-offs: minimal additional token cost with marked improvement in accuracy.

5. Analysis of Existing Representations and Conjecture Support

The paper revisits established code representations:

SBT and variants: Usually reside in the LR(1) class with non-unique bracket symbols, limiting parsing simplicity.
Grammar-rule–based: Naturally align with the LL(1) paradigm due to explicit rule marking.
SimPy: Designed for compactness; falls between LL(1) and LR(1) properties. This alignment between parsing difficulty (as classified by grammar theory) and observed model performance corroborates the primary conjecture.

6. Practical Implications and Future Directions

GramTrans offers a universal and automatic pathway for creating syntactically simple, LL(1)-style code representations from arbitrary context-free grammars. Its efficacy spans multiple neural architectures (StarCoder 1B, DeepSeek-Coder 1.3B, Qwen2.5 1.5B), programming languages (Python, Java), and benchmark tasks (HumanEval, MBPP, EvalPlus, HumanEval-X). Lowest parsing difficulty consistently yields improved model performance.

Future research directions identified in the paper include:

Further optimizing token efficiency without sacrificing parsing simplicity.
Extending GramTrans to modular programming paradigms and dynamic languages.
Investigating iterative and multi-layer conflict elimination strategies for large-scale grammars and evolving codebases.

7. Significance in Code Generation Research

GramTrans advances the field by demonstrating, with empirical and formal rigor, that syntactic simplicity in code representation—formalized via LL(1) transformations—is a crucial determinant of neural code generation success. It delivers practical algorithms for representation conversion, provides actionable guidelines for sequence format selection, and sets the foundation for subsequent work on grammar-informed synthesis methods. The strong correlation between parsing difficulty and generation accuracy is a central finding, which plausibly extends to other program synthesis domains where structural clarity is paramount.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to GramTrans.