Efficient PROMISE Approach
- The paper demonstrates that the Efficient PROMISE Approach integrates a stateful beam search with embedding and name retrieval to significantly improve automated proof construction.
- It employs a structured, LLM-augmented workflow that balances tactic diversity and subgoal reduction, resulting in enhanced proof automation and reduced computational overhead.
- The approach also extends to scalable stochastic optimization by incorporating low-rank curvature estimates, offering dual benefits for both formal verification and machine learning.
The Efficient ProMises Approach encompasses advanced algorithmic schemes for automated proof search and scalable stochastic optimization, both unified under the PROMISE moniker in distinct research threads. The phrase typically denotes methods that target computational efficiency—either in automated proof construction in formal verification or in preconditioned stochastic optimization for machine learning—by leveraging structured state representations and embedding-based retrieval or scalable curvature estimation. This article covers the technical underpinnings, algorithmic workflow, complexity considerations, and empirical performance of the PROMISE approach in both domains.
1. Algorithmic Foundation: Structure-Aware Proof Search
PROMISE, as introduced for proof automation, reframes proof synthesis as a stateful beam search over proof-state transitions within the Isabelle/HOL theorem prover. Each search node is a tuple , capturing the proof prefix, internal system state, active assumptions, current goal, and number of subgoals. The core workflow initializes this state by probing the target theorem, then iteratively expands a beam of candidate proof states through a depth-limited search guided by a command generator.
At each depth, this generator retrieves structurally relevant proof templates and contextual lemma names. It builds prompts for an LLM backend, normalizes and statically filters outputs for syntactic validity and referential grounding, and then dispatches tactic candidates for machine-checked execution in Isabelle. Progressing states—those reducing subgoal count—are scored and enqueued for further expansion, prioritizing structural diversity via a diversification term in the beam-scoring objective. Proof completion is certified by a whole-theory build for validity (Ahn et al., 7 Apr 2026).
2. Structural Embedding and Contextual Retrieval
PROMISE's efficiency is anchored in a two-phase retrieval mechanism:
- Structural Retrieval: Replay of all corpus proofs yields a database (~72k states) of canonicalized intermediate goals and tactic suffixes, each mapped to an embedding . At inference, the current goal's embedding queries for nearest neighbors using approximate nearest neighbor search (Euclidean distance). Candidates are ranked by a composite score that incorporates embedding distance, shared constants, and textual overlap, then further reranked with semantic-alignment contributions.
- Name Retrieval: Simultaneously, a dynamic vocabulary of context-relevant lemma/definition names is synthesized by combining implicit naming heuristics, repository-wide n-gram BM25/TF-IDF search, and live queries to Isabelle's internal state. Names are role-bucketed to support grounded, contextually valid tactic instantiations.
Contextual prompts built from these structural and semantic elements provide the LLM with precise, valid construction templates, directly coupling tactic generation to observed proof-state evolutionary patterns (Ahn et al., 7 Apr 2026).
3. Beam Search and Scoring for Efficient Exploration
The stateful beam search operates with fixed width , depth bound , regeneration limit , and candidate cap . At each expansion:
- Candidate commands are LLM-generated and statically filtered; fallback generic tactics supplement if coverage is insufficient.
- After execution in Isabelle, only those reducing subgoals advance.
- Scoring incorporates subgoal reduction (), proof length penalties (to encourage succinctness), and a method-diversity bonus that discourages overuse of the same tactic type:
- Diversity is enforced via a capped, decreasing bonus inversely related to cumulative usage of each method, promoting exploration over exploitation.
Adaptive resource limits and pruning strategies ensure that, on average, 30 LLM calls suffice per lemma, despite a worst-case envelope of 720 per search (Ahn et al., 7 Apr 2026).
4. Empirical Efficiency and Proof Automation Performance
On the seL4 verification suite (223 lemma proofs), PROMISE substantially outperforms prior LLM-driven systems such as Selene and Rango:
| Method | P1 (Easy) | P2 (Mid) | P3 (Hard) | Notable Observation |
|---|---|---|---|---|
| Selene ACC1 (GPT-3.5) | 34 | 9 | — | Single-shot fails at scale |
| Selene ACC5 (GPT-3.5) | 42 | 14 | — | Limited multi-attempt |
| Rango (Qwen2.5) | 57 | 21 | 3 | Keyword retrieval, flat tactic rollout |
| PROMISE (Qwen2.5) | 77 | 36 | 7 | +20pp on easy, +15pp on mid |
| PROMISE (GPT-3.5) | 85 | 40 | — | +26pp, +186% relative gain vs. Selene ACC5 |
PROMISE demonstrates stable success rates across LLM backends, consistently achieves higher proof coverage, and delivers robust transferability, markedly reducing performance variance compared to baselines (Ahn et al., 7 Apr 2026).
5. Architectural Efficiency: Complexity and Scaling
The method's efficiency derives from several factors:
- The beam/structural retrieval is O(log N) via ANN search for N stored proof states.
- Name retrieval is a hybrid of lightweight textual and live system queries.
- Proof verification, though dominating runtime, is optimized via memoization on unique task signatures.
- Resource allocation (beam width, depth, regeneration, candidate size) is tightly controlled to limit unnecessary LLM queries.
Relative to single-shot or keyword-only methods, PROMISE’s stateful approach incurs modestly higher per-lemma compute cost but yields orders-of-magnitude improvement in proof completion for large, interdependent formal systems (Ahn et al., 7 Apr 2026).
6. Significance and Methodological Impact
PROMISE advances proof automation by transforming the search paradigm from surface-level, text generation and static retrieval to a dynamic, structure-aware, process-level exploration. Structural retrieval uncovers and reuses proof-state evolution patterns—e.g., rewrite chains or sequential method application—mirroring expert human reasoning at a granular level. Name retrieval ensures semantic validity and contextually grounded tactics. The Markovian, beam-driven search supplants random sampling or flat rollouts, producing scalable, robust automation suited for long, compositional proof chains such as those encountered in real-world verification efforts (e.g., seL4) (Ahn et al., 7 Apr 2026).
A plausible implication is that such stateful, structure-guided approaches will become standard for leveraging LLMs in program verification and other domains with deep logical and structural dependencies.
For stochastic optimization, PROMISE also refers to a method family based on preconditioned stochastic gradient algorithms employing scalable, sketching-based curvature estimates. These include preconditioned variants of SVRG, SAGA, and Katyusha, leveraging randomized low-rank or sketched Hessian approximations for efficient curvature incorporation (Frangella et al., 2023). However, in the context of proof automation and structural reasoning, the Efficient PROMISE Approach denotes the structure-driven, LLM-augmented beam search and retrieval methodology as detailed above.