Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization (2403.18120v1)

Published 26 Mar 2024 in cs.AI, cs.CL, and cs.LG

Abstract: LLMs (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal mathematics (e.g. in Isabelle, a formal theorem proving environment), they can be prompted to translate i.e. autoformalize informal mathematical statements into formal Isabelle code -- which can be verified automatically for internal consistency. This provides a mechanism to automatically reject solutions whose formalized versions are inconsistent within themselves or with the formalized problem statement. We evaluate our method on GSM8K, MATH and MultiArith datasets and demonstrate that our approach provides a consistently better heuristic than vanilla majority voting -- the previously best method to identify correct answers, by more than 12% on GSM8K. In our experiments it improves results consistently across all datasets and LLM model sizes. The code can be found at https://github.com/jinpz/dtv.

References (47)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/AdamMarblestone/status/1789117904698409408

https://twitter.com/YugenOk/status/1791046381987078654

https://twitter.com/JasonRute/status/1841800743386300682

https://twitter.com/YugenOk/status/1788856161719976288

https://twitter.com/YugenOk/status/1790411785024971047

https://twitter.com/0xdefec7edcafe/status/1773341890613489917

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization (2403.18120v1)

Summary

Related Papers

Tweets