Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach (2505.14479v3)

Published 20 May 2025 in cs.AI and cs.CL

Abstract: LLMs struggle with formal domains that require rigorous logical deduction and symbolic reasoning, such as mathematical proof generation. We propose a neuro-symbolic approach that combines LLMs' generative strengths with structured components to overcome this challenge. As a proof-of-concept, we focus on geometry problems. Our approach is two-fold: (1) we retrieve analogous problems and use their proofs to guide the LLM, and (2) a formal verifier evaluates the generated proofs and provides feedback, helping the model fix incorrect proofs. We demonstrate that our method significantly improves proof accuracy for OpenAI's o1 model (58%-70% improvement); both analogous problems and the verifier's feedback contribute to these gains. More broadly, shifting to LLMs that generate provably correct conclusions could dramatically improve their reliability, accuracy and consistency, unlocking complex tasks and critical real-world applications that require trustworthiness.

PDF Abstract

Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach

The paper "Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach" presented by Sultan et al. addresses a salient challenge in the domain of artificial intelligence, particularly in the application of LLMs to formal and structured tasks like mathematical proof generation. Despite the successful deployment of LLMs in diverse applications, their limitations become evident when tasked with generating rigorous, logically sound proofs, given their reliance on probabilistic sequence generation from textual patterns. The paper proposes a novel neuro-symbolic approach that seeks to enhance the reliability of LLMs in these formal domains.

Neuro-Symbolic Integration

The authors propose a dual-component method, combining LLMs with symbolic reasoning tools. First, the approach involves analogical reasoning, wherein the model retrieves structurally analogous problems to guide its proof generation. This is inspired by human cognitive problem-solving mechanisms where analogy aids generalization. Analogous problems are determined using structural similarity measures and accompanied by their solved proofs. The second component employs a symbolic verifier to iteratively check the model's generated proofs, providing feedback where errors are identified. This iterative loop continues until a valid proof is achieved or a retry limit is reached.

Empirical Results and Implications

Empirical results demonstrate substantial improvements in proof accuracy, with a remarkable 58%-70% accuracy gain over the OpenAI o1 model. The paper attributes these improvements equally to the use of analogous problems and the verifier's feedback. This leap in accuracy underscores the potential of hybrid neuro-symbolic systems to augment the reliability, accuracy, and consistency of LLMs across complex tasks requiring trustworthy outputs.

Contributions and Impact

The primary contributions of the paper include:

A neuro-symbolic framework for proof generation incorporating analogical retrieval and verifier feedback.
A specialized symbolic verifier for geometry proofs within the FormalGeo-7k dataset.
Documented improvements in proof accuracy with reduced computational costs due to focused theorem context construction.

This methodology paves the way for future developments in neuro-symbolic systems tailored to enhance LLM capabilities, particularly in domains mandating strict logical consistency. Such advancements could unlock applications in STEM education, automated reasoning systems, and safety-critical evaluations where formal correctness is non-negotiable.

Conclusion

By synthesizing the generative power of LLMs with structured verification processes, the researchers present a promising strategy to overcome current limitations faced in formal reasoning tasks. As AI continues to evolve, integrating symbolic components with deep learning systems may not only refine the precision of LLM outputs but could potentially redefine AI's role in domains necessitating reliability and exactitude. Future research could explore extending this approach beyond geometry, tackling diverse mathematical disciplines and scientific theories. The potential for neuro-symbolic systems to underpin robust AI models is a fertile ground for innovation, promising to bridge the gap between computation-driven insights and the stringent demands of formal correctness.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Oren Sultan (6 papers)
Eitan Stern (1 paper)
Dafna Shahaf (33 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/oren_sultan/status/1926537986402877860

https://twitter.com/oren_sultan/status/1926539783846650344

YouTube

Show All Videos