Generalization of specialized Lean provers beyond mathematics
Determine whether specialized Large Language Model theorem provers for Lean that are trained and evaluated primarily on mathematical datasets generalize effectively to scientific domains beyond mathematics, and quantify the extent of such generalization relative to their performance within mathematics.
References
First, since they were mainly trained and tested in the the domain of mathematics, their ability to generalize beyond this domain remains unclear.
— Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
(2510.12787 - Tredici et al., 14 Oct 2025) in Section 1 (Introduction)