Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lean4PHYS: Formalizing Physics in Lean 4

Updated 27 April 2026
  • Lean4PHYS is a unified framework for formalizing and verifying college-level physics using Lean 4, featuring a community-driven library and a curated benchmark suite.
  • It rigorously manages units, dimensions, and algebraic tactics to ensure machine-checkable proofs of both fundamental and advanced physics theorems.
  • The framework supports diverse applications from undergraduate coursework to competition-level problems, enabling systematic evaluation of automated proof models.

Lean4PHYS is a comprehensive, unified framework for formalizing and verifying college-level physics within the Lean 4 proof assistant. Its core mission is to enable machine-checkable, fully verified reasoning across introductory and competition-level physics domains by leveraging a rigorously constructed Lean 4 library and benchmark suite. The framework combines a community-driven library of formal physics theorems, PhysLib, with a challenging, expert-curated benchmark, LeanPhysBench, providing a robust infrastructure for evaluating both domain-specific and general theorem-proving models in mathematical physics (Li et al., 30 Oct 2025).

1. Architecture and Foundational Principles

Lean4PHYS operates on a bottom-up design that prioritizes rigorous treatment of units, dimensions, and formal algebraic manipulation. The framework consists of two principal components:

  • PhysLib: A Lean 4 library encapsulating both fundamental and advanced physics concepts, systematically built on seven SI base units (time, length, mass, current, temperature, amount, luminous intensity) with an extensible unit theory. PhysLib integrates a dimension-aware NormedSpace, algebra over units, calculus, and key physical constants. Theorems are formalized to facilitate compatibility with Lean 4's algebraic tactic infrastructure, such as norm_num and ring, while supporting dimension-preserving casting between physics-specific and mathematical entities. Example formalizations include Newton’s second law:
    1
    2
    3
    
    def F_of (m : Mass) (a : Acceleration) : Force := m * a
    theorem eq_newton_second_law (m : Mass) (a : Acceleration) :
      (F_of m a).val = (m.val * a.val) := by simp [F_of]
    and kinetic energy:
    1
    2
    3
    
    def KE (m : Mass) (v : Speed) : Energy := ½ * m * v^2
    theorem KE_def (m : Mass) (v : Speed) :
      (KE m v).val = ½*m.val*v.val^2 := by simp [KE]
  • LeanPhysBench: A suite of 200 Lean 4-formalized physics problems, equally split between university-level ("UGPhysics") and competition/Olympiad problems. The competition subset is further divided into 62 formulaic "easy" problems and 34 "hard" multi-step derivations, explicitly targeting functional and quantifier reasoning capabilities. Each problem includes precise hypotheses (e.g., physical laws, initial conditions), a single formal goal (number, symbolic formula, or logical property), and is independently authored and formally peer-reviewed by domain experts.

This dual-component structure is designed to provide both a foundation (library) for reasoning and a rigorous benchmark for empirical evaluation of formal proof agents (Li et al., 30 Oct 2025).

2. Formalization Pipeline and Validation Process

The LeanPhysBench formalization pipeline ensures an exact, reproducible alignment between natural-language physics problems and their formal theorem counterparts. The sequence is as follows:

  1. QA-to-Proof Alignment: Each original physics question is rephrased from interrogative form (e.g., "find X") to a formal proof goal ("prove that X = ..."), with LLMs leveraged for producing human-readable solutions as intermediate scaffolding.
  2. Extraction of Laws and Initial Conditions: All domain laws and givens are promoted to explicit Lean hypotheses.
  3. Unique Goal Declaration: A well-defined, checkable Lean theorem statement encapsulating the proof target.
  4. Code Authoring and Review: Every theorem is constructed by a Lean+physics expert and subjected to peer review by at least two additional examiners. Manual and automated compile-checks verify that the hypotheses and goals strictly encode the intended domain semantics.

Validation guarantees that the encoded theorems capture both the physical intent and logical correctness, with explicit delineation between hypotheses, domain-specific laws, and proof goals (Li et al., 30 Oct 2025).

3. Benchmark Content and Representative Examples

LeanPhysBench is meticulously curated with problems spanning mechanics, electrodynamics, thermodynamics, acoustics, optics, and modern physics. The benchmark's problems are representative of both canonical undergraduate curricula (Young & Freedman’s University Physics) and high-caliber physics competitions.

A representative "competition-level" problem—capstan law—illustrates the high fidelity and dimensional rigor of formalization:

1
2
3
4
5
6
7
8
9
10
11
theorem Ch2_Q1 (M m : Mass) (μ n : ℝ) (θ_total : ℝ := 2*π*n)
  (T : ℝ → Force)
  (h_pos : 0 < M.val ∧ 0 < m.val ∧ 0 < μ)
  (hM_gt_m : M.val > m.val)
  (T_light_def : T 0 = m * g)
  (T_heavy_def : T θ_total = M * g)
  (capstan_differential : ∀ θ, deriv (λ θ', (T θ').val) θ = μ*(T θ).val)
  (capstan_integral : log ((T θ_total).val / (T 0).val) = μ*θ_total)
  (theta_def : θ_total = 2*π*n)
: n = (1/(2*π*μ)) * log (M.val / m.val) := by
-- proof using rcases, field_simp, ring_nf, linarith, etc.

Another illustration from the "college-level" suite addresses force from point charges (Coulomb law), ensuring explicit unit conversions and assignment through the formal library:

1
2
3
4
5
6
7
8
9
10
11
theorem Electromag_Force_University
  (q1 q2 q3 : Charge)(x1 x2 x3 : Length)(F : Force)
  (hq1 : q1 = SI.nano (1 • coulomb))
  (hq2 : q2 = SI.nano (-3 • coulomb))
  (hq3 : q3 = SI.nano (5 • coulomb))
  (hx1 : x1 = 0.02 • meter)
  (hx2 : x2 = 0.04 • meter)
  (hx3 : x3 = 0)
  (hF : F = K * q3 * (q1/(x1-x3)^2 + q2/(x2-x3)^2)) :
F = (9 * 10^-22 / 32) • newton := by
simp [hq1, hq2, hq3, hx1, hx2, hx3, hF]; norm_num
Such examples exemplify Lean4PHYS’s commitment to dimension-aware, syntactically precise formalizations (Li et al., 30 Oct 2025).

4. Empirical Evaluation and Baseline Model Results

Evaluation of Lean4PHYS utilized the "pass@16" metric (fraction of benchmark problems solved by at least one of 16 generated proof attempts). Both open-source expert Math Lean 4 provers and closed-source generalist LLMs were assessed, with and without PhysLib context import.

Model Without PhysLib With PhysLib
DeepSeek-Prover-V2-7B 11.5% 14.5%
Goedel-Prover-V2-8B 10.0% 13.0%
Kimina-Prover-8B 9.0% 12.5%
DeepSeek-R1-8B 2.0% 6.5%
Qwen3-8B 6.5% 2.0% (anomaly)
GPT-4o 2.0% 13.0%
Claude-Sonnet-4 2.0% 34.5%
Gemini-2.5-pro 7.5% 39.5%

The average uplift provided by PhysLib was 11.75 percentage points, with Gemini-2.5-pro achieving the highest pass@16 (39.5%) on the benchmark. These results highlight the non-trivial challenge posed by LeanPhysBench, as well as the tangible utility of a rich, reusable physics theorem library in enabling proof engine success (Li et al., 30 Oct 2025).

The PhysProver system provides insight into physics-specific automated theorem proving, demonstrating that domain-specific reinforcement learning with verifiable rewards (RLVR) yields measurable improvements in both in-domain (physics) and general mathematics theorem proving (Zhang et al., 22 Jan 2026). By constructing a formal dataset (PhysLeanData) through seed sampling and conjecture generation, PhysProver achieved domain move gains of +2.4% overall (pass@16), with stronger subdomain improvements in classical physics and particle/string theory.

Crucially, RLVR training on formalized physics proof data improves not just in-domain performance but also out-of-domain benchmarks (e.g., MiniF2F), suggesting a synergistic benefit to formal mathematics provers from physics-centric curriculum (Zhang et al., 22 Jan 2026). Recommendations for Lean4PHYS development include:

  • Systematic extraction and curation of formalized Lean 4 physics data
  • Controlled conjecture synthesis and auto-verification pipelines
  • Adoption of RL with binary or partial-credit reward structures tailored to the physics domain
  • Metric tracking for both in-domain (physics) and standard mathematics benchmarks

A plausible implication is that extending Lean4PHYS along these lines can facilitate both improved physics automation and the broader integration of scientific domains into the formal verification ecosystem (Zhang et al., 22 Jan 2026).

6. Current Challenges and Prospects for Expansion

Lean4PHYS faces several technical and community-scale challenges:

  • Library Bootstrapping: Developing a robust and extensible unit system with broad coverage (mechanics, optics, thermodynamics, etc.), requiring continued community participation.
  • Transfer Gap: Existing expert Math provers are limited by their inability to handle new physics constructs (e.g., quantity.val, unit casting). Addressing this gap may necessitate targeted fine-tuning or new training curricula.
  • Complex Reasoning: Multi-step competition problems involving quantifiers and calculus remain unsolved (pass@16 ≈ 0%), highlighting the need for advancements in symbolic reasoning, such as reflection, self-play, or hybrid natural-formal proof approaches.
  • Scalability and Extensibility: Lean4PHYS establishes a blueprint for formalizing additional scientific domains (chemistry, high-energy physics via tensor libraries), supporting the gradual expansion of formal verification to encompass the natural sciences in a machine-checkable modality.

Ongoing and future directions include dataset expansion, methodology generalization, and the creation of multi-agent proof paradigms to handle the rich API calls and deep hierarchical reasoning intrinsic to advanced physics (Li et al., 30 Oct 2025, Zhang et al., 22 Jan 2026).

7. Impact and Significance

Lean4PHYS represents the first rigorous machine-checked physics benchmark and library in Lean 4, setting a foundational paradigm for formalizing entire scientific disciplines beyond mathematics. By combining unit-theory rigor, peer-validated benchmarking, and methodical assessment of automated provers, Lean4PHYS not only advances research in formal reasoning but also demonstrates the technical gaps and opportunities in current LLM-based and symbolic proof systems when tasked with scientific logic. The ecosystem cultivated by Lean4PHYS is positioned to be instrumental for the verification, reproducibility, and future automation of complex physics reasoning workflows in academia and beyond (Li et al., 30 Oct 2025, Zhang et al., 22 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lean4PHYS.