VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models (2411.19275v3)

Published 28 Nov 2024 in cs.SE

Abstract: LLMs have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VECOGEN, a novel tool that combines LLMs with formal verification to automate the generation of formally verified C programs. VECOGEN takes a formal specification in ANSI/ISO C Specification Language, a natural language specification, and a set of test cases to attempt to generate a verified program. This program-generation process consists of two steps. First, VECOGEN generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program is correct. We evaluate VECOGEN on 15 problems presented in Codeforces competitions. On these problems, VECOGEN solves 13 problems. This work shows the potential of combining LLMs with formal verification to automate program generation.

Summary

The paper introduces VeCoGen, a tool combining Large Language Models with formal verification (using Frama-C) in an iterative refinement process to automate the generation of formally verified C code.
Evaluating on a custom dataset of 15 competitive programming problems, VeCoGen successfully generated verified solutions for 13, showing the effectiveness of its iterative feedback loop.
VeCoGen presents significant implications for automating program synthesis in safety-critical domains by providing verifiable correctness guarantees for generated code.

Automated Generation of Formally Verified C Code Using LLMs: An Analysis of VEC O GEN

The paper presented by Merlijn Sevenhuijsen and colleagues introduces the tool VEC O GEN, a system designed to fuse LLMs with formal verification methods to automate the creation of verified C programs. This integration aims to address the reliability issues commonly associated with LLM-generated code, particularly in safety-critical applications. LLMs like GPT have inherent prowess in generating syntactic code but often lack the semantic accuracy required for high-assurance domains. By aligning code generation with formal specifications, VEC O GEN ventures into automating program synthesis with verifiable guarantees of correctness.

Methodology and Approach

VEC O GEN implements a two-step iterative process for generating verified C code. Initially, it generates candidate programs based on natural language and formal specifications. These are given in ANSI/ISO C Specification Language (ACSL) and are verified using Frama-C plugins. In cases where candidates fail to meet the required specifications, VEC O GEN iteratively refines these programs. Feedback from a compiler and a formal verifier informs this refinement, thus enhancing the generated code iteratively until a formally verified solution is achieved.

The approach highlights leveraging both the weakest precondition (WP) and runtime error (RTE) checks within Frama-C to ascertain program correctness with respect to the ACSL-specified constraints. By doing so, VEC O GEN capitalizes on the strengths of both LLM-generative capabilities and stringent verification techniques.

Evaluation and Findings

The researchers evaluated VEC O GEN using a custom dataset, VECOSET, composed of 15 competitive programming problems from Codeforces. The tool effectively solved 13 out of these 15 problems, demonstrating substantial promise. Initially, nine problems were resolved during the first generation phase, with additional solutions emerging during subsequent refinement iterations. The results signify that the iterative feedback mechanism significantly enhances program correctness and completeness.

This ability to refine and converge upon correct solutions even when initial attempts fail is critical. It reflects the ability of VEC O GEN to adapt and iterate, effectively 'learning' from past errors to guide future attempts more productively.

Implications and Future Work

Practically, the tool presents significant implications for automating program synthesis in safety-critical domains where software defects can lead to substantial risks. VEC O GEN positions itself as a viable automatic code generation tool with reliable correctness guarantees, making it particularly relevant to industries like automotive, aerospace, and healthcare, where software verification is paramount.

Theoretically, the integration of LLMs with formal methods in code generation poses intriguing questions and opportunities for expansion. Future iterations might extend VEC O GEN's scope to handle more complex functions involving loops or incorporate multifaceted data structures. The insights into balancing natural language with formal specifications could also provide pivotal advancements in developing more sophisticated AI-driven development tools.

Conclusion

In conclusion, VEC O GEN represents a significant stride in AI-assisted software engineering. By marrying the generative capabilities of LLMs with the rigorous checks provided by formal verification, it bridges a vital gap in creating reliable, verifiable code automatically. The success of VEC O GEN not only demonstrates the feasibility of such integrated approaches but also encourages further exploration into the nuanced interplay between AI and formal methods in software development. As AI continues to permeate various engineering disciplines, tools like VEC O GEN will be at the forefront, driving innovation while ensuring safety and precision.

PDF Markdown