miniCodeProps: a Minimal Benchmark for Proving Code Properties (2406.11915v2)

Published 16 Jun 2024 in cs.SE, cs.AI, and cs.LG

Abstract: AI agents have shown initial promise in automating mathematical theorem proving in proof assistants such as Lean. The same proof assistants can be used to verify the correctness of code by pairing code with specifications and proofs that the specifications hold. Automating the writing of code, specifications, and proofs could lower the cost of verification, or, ambitiously, enable an AI agent to output safe, provably correct code. However, it remains unclear whether current neural theorem provers can automatically verify even relatively simple programs. We present miniCodeProps, a benchmark of 201 program specifications in the Lean proof assistant, aimed at the subproblem of automatically generating a proof for a provided program and specification. miniCodeProps contains specifications about simple, self-contained programs (e.g., lists, natural numbers, binary trees) with varied proof difficulty. Despite its simplicity, miniCodeProps is sufficient to break current LLM-based provers, with state-of-the-art methods showing promise on the easy properties in miniCodeProps, yet failing to prove nearly all of the medium and hard properties. We publicly release miniCodeProps as a benchmark for furthering automated theorem proving in the context of formally verified code.

Authors (2)

Evan Lohn (2 papers)
Sean Welleck (54 papers)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Jose_A_Alonso/status/1803332246843093026

miniCodeProps: a Minimal Benchmark for Proving Code Properties (2406.11915v2)

Summary

Related Papers

Tweets