2000 character limit reached
Can LLMs Enable Verification in Mainstream Programming?
Published 18 Mar 2025 in cs.SE, cs.AI, and cs.PL | (2503.14183v1)
Abstract: Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using LLMs is becoming increasingly widespread, but it rarely considers producing strong correctness guarantees. In this study, we explore the ability of LLMs to produce verified code in three verification languages (Dafny, Nagini, and Verus). To do so, we use manually curated datasets derived from the state-ofthe-art Python benchmark, HumanEval. We also assess what types of information are sufficient to achieve good-quality results.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.