Execution Guided Line-by-Line Code Generation (2506.10948v1)

Published 12 Jun 2025 in cs.LG

Abstract: We present a novel approach to neural code generation that incorporates real-time execution signals into the LLM generation process. While LLMs have demonstrated impressive code generation capabilities, they typically do not utilize execution feedback during inference, a critical signal that human programmers regularly leverage. Our method, Execution-Guided Classifier-Free Guidance (EG-CFG), dynamically incorporates execution signals as the model generates code, providing line-by-line feedback that guides the generation process toward executable solutions. EG-CFG employs a multi-stage process: first, we conduct beam search to sample candidate program completions for each line; second, we extract execution signals by executing these candidates against test cases; and finally, we incorporate these signals into the prompt during generation. By maintaining consistent signals across tokens within the same line and refreshing signals at line boundaries, our approach provides coherent guidance while preserving syntactic structure. Moreover, the method naturally supports native parallelism at the task level in which multiple agents operate in parallel, exploring diverse reasoning paths and collectively generating a broad set of candidate solutions. Our experiments across diverse coding tasks demonstrate that EG-CFG significantly improves code generation performance compared to standard approaches, achieving state-of-the-art results across various levels of complexity, from foundational problems to challenging competitive programming tasks. Our code is available at: https://github.com/boazlavon/eg_cfg

PDF Abstract

Execution Guided Line-by-Line Code Generation

The paper "Execution Guided Line-by-Line Code Generation" introduces a novel methodology for neural code generation, leveraging real-time execution feedback to enhance the performance of LLMs in producing executable code. The primary focus is on integrating execution signals into the LLM's inference process, thus reflecting a critical aspect that human programmers naturally employ: testing code in real-time and refining it iteratively based on actual execution behavior.

Key Component of EG-CFG

The researchers present Execution-Guided Classifier-Free Guidance (EG-CFG), an approach that uses execution signals dynamically to aid code generation. This method deviates from conventional practices wherein LLMs rely mainly on syntactic pattern recognition without runtime feedback, often leading to code that may appear syntactically correct but fails to execute as intended on real inputs. EG-CFG is characterized by a multi-stage process, which consists of:

Beam Search for Candidate Completions: First, EG-CFG applies beam search to generate a set of candidate program completions for each line of code.
Execution and Feedback Extraction: These candidates are then executed against predefined test cases, yielding execution signals.
Dynamic Signal Integration: Finally, these signals are incorporated back into the generation prompt, updating the guidance for continuous and coherent code generation. The method ensures consistency across tokens within the same line and refreshes signals at line boundaries.

Additionally, EG-CFG supports task-level native parallelism, allowing multiple agents to operate in parallel, exploring diverse reasoning paths and collaboratively generating a wide range of candidate solutions.

Experimental Results

The implementation of EG-CFG shows significant improvements across diverse coding tasks. With experiments conducted over several benchmarks—ranging from foundational to competitive programming tasks—the approach proves superior by achieving state-of-the-art results. Specifically, EG-CFG attains commendable accuracy on the MBPP, MBPP-ET, HumanEval, and CodeContests benchmarks using open-source models, surpassing previous methods utilizing leading closed-source models.

For instance, EG-CFG achieves 96.6% accuracy on the MBPP benchmark and 87.19% on the HumanEval-ET benchmark, highlighting substantial gains compared to their predecessors. These figures illustrate not only accurate code generation but also enhanced robustness under challenging testing conditions, demonstrating a critical leap in leveraging execution signals dynamically and effectively.

Implications and Future Directions

The methodology outlined in this paper has profound implications for both practical and theoretical aspects of AI-driven code generation. Practically, the ability to dynamically integrate execution feedback at runtime paves the way for more reliable AI-assisted programming tools, potentially transforming automated code synthesis and debugging practices. Theoretically, this paper enriches the understanding of how real-time feedback can be harnessed to refine neural network outputs, suggesting a shift towards more interactive and adaptive learning systems.

Looking ahead, this approach may inspire further exploration into integrating external semantic signals into generative processes, extending beyond code generation. The research opens avenues for application in domains requiring grounding in real-world execution, such as database querying or simulation-based generation tasks.

Overall, the paper presents a substantial enhancement in the field of program synthesis with LLMs, contributing a new direction whereby execution signals are not merely post-process reflections but are actively shaping generation in real-time. Future studies may continue to refine this integration, optimizing computational efficiency and expanding the scope of tasks in which such methods can be effectively utilized.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Boaz Lavon (1 paper)
Shahar Katz (5 papers)
Lior Wolf (217 papers)

Execution Guided Line-by-Line Code Generation (2506.10948v1)