Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers (2205.10893v1)
Abstract: In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on LLMs, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating LLMs and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to LLMs. Thor increases a LLM's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither LLMs nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.