Magnushammer: A Transformer-Based Approach to Premise Selection (2303.04488v3)

Published 8 Mar 2023 in cs.LG, cs.AI, and cs.LO

Abstract: This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant.

Citations (33)

View on Semantic Scholar

Summary

The paper introduces a transformer-based approach that significantly improves premise selection performance in automated theorem proving.
It demonstrates empirical success with a 59.5% success rate on the PISA benchmark and efficient scaling using four times fewer parameters than baseline models.
The study also provides a new large-scale dataset for proof state representations, fostering future research in transformer-based ATP.

Analysis of "Magnushammer: A Transformer-Based Approach to Premise Selection"

The paper "Magnushammer: A Transformer-Based Approach to Premise Selection" introduces a novel methodology for premise selection in the context of automated theorem proving (ATP). In particular, it focuses on improving the performance and efficiency of premise selection, a fundamental task traditionally handled by symbolic methods in interactive theorem proving environments. The proposed approach, Magnushammer, leverages contrastive learning within a transformer architecture, demonstrating significant improvements over existing tools such as Sledgehammer.

From an empirical standpoint, Magnushammer showcases compelling numerical results. The system achieves a premise selection success rate of 59.5% on the PISA benchmark, substantially surpassing Sledgehammer's 38.3%. On the miniF2F dataset, Magnushammer reaches a 34.0% success rate, compared to the 20.9% achieved by Sledgehammer. Furthermore, when integrating Magnushammer with a language-model-based ATP, the authors report a state-of-the-art proof success rate on the PISA benchmark of 71.0%, marking an improvement from the prior 57.0% baseline. These results elucidate the efficacy of the proposed approach, particularly in scaling performance with fewer computational resources, as illustrated by achieving these improvements while employing four times fewer parameters than the baseline system.

The paper introduces a substantial contribution to the dataset repository used in premise selection. The authors curated and made publicly available a new dataset consisting of textual representations of proof states and their corresponding relevant premises. This dataset, containing 4.4 million premise selections and over 433,000 unique premises, claims to be the largest such dataset in the domain and the first of its kind for the Isabelle proof assistant. This contribution has the potential to stimulate further research in the area of machine learning-based ATP systems, enabling both comparative studies and novel methodological applications.

The implications of Magnushammer are twofold: theoretically, it demonstrates the viability of applying transformer architectures to premise selection, opening avenues for further research into deep learning applications in formal logic and theorem proving. Practically, its significant performance gains provide an efficient, scalable option for premise selection, which can potentially reduce the entry barrier for deploying hammer tools across different proof assistants, broadening their applicability and effectiveness.

Looking forward, several directions for further research are suggested. While the Magnushammer approach shows great promise in Isabelle, its adaptability to other proof assistants with differing underlying logics (such as Coq or Lean) represents an intriguing avenue. Additionally, exploring richer representations of proof states could further enhance its performance, potentially extending the method's applicability beyond the field of existing proof libraries. Furthermore, integrating this premise selection approach into broader systems that also model proof step retrieval and generation could yield significant advancements in automated reasoning systems.

In conclusion, the Magnushammer provides a robust, data-driven, and scalable solution to premise selection in ATP, demonstrating both theoretical advancements and practical utility. Its success in leveraging modern transformer architecture in this context not only benchmarks progress within the field but also offers a blueprint for future exploration and system design in automated theorem proving.

PDF Markdown

Related Papers

Tweets

https://twitter.com/PiotrRMilos/status/1788297894610522213

https://twitter.com/cosminnegruseri/status/1788316687856804319

https://twitter.com/mgostIH/status/1746195257358680420

https://twitter.com/AlvanArulandu/status/1896733796260827184

YouTube

Show All Videos