LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

Published 10 Jun 2025 in cs.AI, cs.LO, and cs.HC | (2506.08321v1)

Abstract: We present LeanTutor, a LLM-based tutoring system for math proofs. LeanTutor interacts with the student in natural language, formally verifies student-written math proofs in Lean, generates correct next steps, and provides the appropriate instructional guidance. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. The first module faithfully autoformalizes student proofs into Lean and verifies proof accuracy via successful code compilation. If the proof has an error, the incorrect step is identified. The next-step generator module outputs a valid next Lean tactic for incorrect proofs via LLM-based candidate generation and proof search. The feedback generator module leverages Lean data to produce a pedagogically-motivated natural language hint for the student user. To evaluate our system, we introduce PeanoBench, a human-written dataset derived from the Natural Numbers Game, consisting of 371 Peano Arithmetic proofs, where each natural language proof step is paired with the corresponding logically equivalent tactic in Lean. The Autoformalizer correctly formalizes 57% of tactics in correct proofs and accurately identifies the incorrect step in 30% of incorrect proofs. In generating natural language hints for erroneous proofs, LeanTutor outperforms a simple baseline on accuracy and relevance metrics.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper demonstrates LeanTutor as a formally-verified AI tutor that autoformalizes and verifies student mathematical proofs.
It employs a three-module architecture integrating autoformalization, next-step generation via proof search, and natural language feedback.
Experiments on the PeanoBench dataset show substantial improvements in autoformalization accuracy and overall tutoring efficacy.

Essay on "LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs"

The paper "LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs", authored by Manooshree Patel et al., presents a significant development in the intersection of AI-enhanced education and formal theorem proving. The central contribution of this paper is LeanTutor, an AI-driven tutoring system designed to assist undergraduate students in learning mathematical proofs through interaction in natural language. This system emphasizes formal verification, autoformalization, natural language feedback, and structured next-step guidance, operating through the Lean theorem proving environment.

Overview

LeanTutor comprises three integrated modules: an autoformalizer and proof checker, a next-step generator, and a natural language feedback generator. The purpose of the autoformalizer is to translate student-driven natural language proof segments into Lean's formal language, allowing for verification through Lean's compiler. This module demonstrates efficacy in transforming correct student steps, identifying incorrect ones through compilation errors—a tangible advantage over conventional methods that predominantly address complete proofs.

The next-step generator leverages a sophisticated LLM-directed proof search strategy. It is capable of providing students with successive Lean tactics by constructing a search tree validated by Lean's compilation process. This approach resembles the COPRA agent's depth-first search methodology, ensuring robust step validation and the avoidance of cyclic proofs.

The feedback generation module synthesizes input from the autoformalizer and the next-step generator to create targeted educational hints and explicit next steps for students. This module adeptly incorporates Lean's error messaging and leverages pedagogical techniques to enable effective learning without revealing complete solutions prematurely.

Dataset and Evaluation

The authors introduce PeanoBench, a novel dataset derived from the Natural Numbers Game, comprising 371 Peano Arithmetic proofs annotated with natural language descriptions. These proofs serve two primary roles: acting as student input that LeanTutor evaluates and as a benchmark for measuring the system's performance in autoformalization and feedback generation tasks. The dataset, categorized into distinct "worlds" reflecting different mathematical concepts, provides a structured framework for proof evaluation.

LeanTutor's performance is quantitatively assessed through experiments measuring the accuracy of the autoformalizer on both correct and incorrect proofs, alongside qualitative evaluations of the feedback module. The results demonstrate a substantial improvement in faithful autoformalization and feedback quality compared to baseline models, affirming the system's viability in a classroom environment.

Implications and Future Directions

The development of LeanTutor presents considerable implications for both educational technology and the practical application of automated theorem proving. By enabling real-time feedback and guidance, LeanTutor addresses some of the vital challenges faced by students learning mathematical proofs—such as error identification and methodological guidance—through formal verification. The system promises enhanced student engagement and self-directed learning, potentially transforming the landscape of mathematical education.

From a broader theoretical perspective, LeanTutor embodies the convergence of natural language processing and formal methods, suggesting future applications in automated reasoning and interactive theorem proving. It lays the groundwork for developing AI systems that not only perform theorem proving but do so while maintaining pedagogical integrity and promoting user understanding and interaction in natural language.

While the current iteration of LeanTutor shows promise, future work may consider expanding its dataset to include more complex proofs and diverse mathematical domains, thereby increasing its robustness and applicability. Further research might explore optimizing interaction models for lower-power devices or integrating feedback loops wherein student interaction refines system learning models.

In conclusion, LeanTutor exemplifies a pivotal advancement in educational AI systems, underpinned by rigorous formal foundations and practical applicability in undergraduate education. Such systems are poised to bridge the gap between advanced computational methodologies and effective educational tools, serving both academic and broader societal needs.

Markdown Report Issue