Models That Prove Their Own Correctness (2405.15722v3)

Published 24 May 2024 in cs.LG, cs.CC, and cs.SE

Abstract: How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving models that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output and successfully proves its correctness to $V!$. The soundness property of $V$ guarantees that, for every input, no model can convince $V$ of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while all incorrect outputs (of any model) are detected by $V$. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD and proves the correctness of its answer.

References (51)

Authors (4)

Noga Amit (2 papers)
Shafi Goldwasser (21 papers)
Orr Paradise (12 papers)
Guy Rothblum (3 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/TrueCryptoPower/status/1892329766079906264

https://twitter.com/Ismael_H_R/status/1931516896966488502

https://twitter.com/QCRH/status/1794970194005102634

https://twitter.com/QCRH/status/1863840366228656286

HackerNews

Models That Prove Their Own Correctness (3 points, 0 comments)

Models That Prove Their Own Correctness (2405.15722v3)

Summary

Related Papers

Tweets

HackerNews