ImandraX: Industrial Reasoning Engine
- ImandraX is an industrial automated reasoning engine that uses higher-order logic and exhaustive input decomposition for formal software verification.
- It integrates bounded model checking, complete decision procedures, and Boyer-Moore induction to generate counterexamples and guarantee mathematical rigor.
- Deployed in financial and safety-critical domains, ImandraX underpins neuro-symbolic pipelines like CodeLogician to deliver actionable verification insights.
ImandraX is an industrial automated reasoning engine designed for the precise analysis of software logic, serving as the rigorous formal backend for neuro-symbolic program understanding pipelines such as CodeLogician. Deployed in financial markets and safety-critical systems, ImandraX integrates higher-order logical inference, exhaustive input decomposition, formal verification, and automated test synthesis to deliver mathematical guarantees for complex software models (Lin et al., 17 Jan 2026).
1. Logical Foundations and Architecture
ImandraX operates on models expressed in the Imandra Modeling Language (IML), a pure-functional, statically-typed subset of OCaml. IML is equipped with built-in constructs for verification (verify, lemma) and region decomposition. The underlying formal semantics of IML are mechanized in higher-order logic (HOL).
Inference in ImandraX leverages several complementary formal techniques:
- Bounded Model Checking: Recursively unrolls definitions for concrete counterexample generation in bounded domains.
- Complete Decision Procedures: Targets linear arithmetic, array and map theory, and selected data types for efficient, complete logical reasoning.
- Typed Boyer-Moore Induction Engine: Supports unbounded proofs via a formal induction “waterfall,” enabling verification over recursive and infinite state spaces.
A Hindley–Milner–style static type system governs typing and evaluation, illustrated by: $\infer[\mathtt{T\text{-}Fun}]{\Gamma \vdash (\texttt{fun}\;x:\tau_1\mapsto e):\tau_1\to\tau_2} {\Gamma,\;x:\tau_1\;\vdash\;e:\tau_2} \quad \infer[\mathtt{T\text{-}App}]{\Gamma \vdash f\,e:\tau_2} {\Gamma\vdash f:\tau_1\to\tau_2 &\;\Gamma\vdash e:\tau_1}$
State machines are treated as first-class entities. Imperative loops over state are modeled by recursive functions:
Region decomposition partitions a function’s input space into finitely many symbolic regions, each defined by: where specifies a region via linear constraints, is an output invariant, and is a witness point.
2. Neuro-Symbolic Reasoning Pipeline via CodeLogician
CodeLogician uses LLMs as "autoformalizers" to translate informal source code (e.g., Python with object orientation, loops, and external calls) into pure-functional IML models suitable for precise automated analysis. The workflow involves:
- Synthesizing an IML model , where is the state transition function, and is the record type representing machine state.
- Producing explicit formal assumptions for any external calls or opaque computations.
- Encoding imperative loops as recursive definitions, and managing side effects with explicit state records.
Formally, given source code , the translation function is
Verification goals are synthesized as Boolean predicates: which are submitted to ImandraX for proof or counterexample finding.
For advanced semantic queries—including enumeration of distinct behaviors, identification of decision boundaries, and detection of edge cases—CodeLogician invokes region decomposition over the IML model under side conditions or basis functions . Each resulting region yields a witness , supporting automated test-case generation in the source code’s native language.
3. The code-logic-bench Benchmark
The code-logic-bench is introduced to systematically evaluate mathematical reasoning about software logic, bridging the gap between specialized theorem proving and practical software engineering benchmarks.
- Dataset: Consists of 50 mid-level models representing state machines with multi-entity interactions, temporal evolution, and decision logic. Each is paired with three core questions targeting:
- State space enumeration
- Conditional analysis
- Property verification
Ground Truth: Defined using ImandraX’s exhaustive region counts and formally-proved invariants or counterexamples.
- Metrics: Seven complementary metrics distinguish LLM-only reasoning from CodeLogician-enhanced neuro-symbolic analysis, including:
- State Space Estimation Accuracy (SSEA):
- Coverage Completeness:
- Outcome Precision - Direction Accuracy - Control Flow Understanding - Edge Case Detection:
- Decision Boundary Clarity
- Evaluation Protocol: Five state-of-the-art LLMs answer each question with and without CodeLogician augmentation, and four independent LLMs act as rubric-based judges, with metric scores averaged.
4. Quantitative Results and Analysis
LLM-only approaches yield aggregate scores between 0.53 and 0.60 (mean across all metrics and models), while formal augmentation via CodeLogician and ImandraX achieves perfect scores (1.0 by definition), thereby closing a substantial 41–47 percentage-point gap in reasoning accuracy. Metric-specific LLM-only means include:
| Metric | LLM-Only Mean |
|---|---|
| Control Flow Understanding | 0.746 |
| Decision Boundary Clarity | 0.695 |
| Direction Accuracy | 0.635 |
| Outcome Precision | 0.613 |
| Edge Case Detection | 0.597 |
| Coverage Completeness | 0.490 |
| State Space Estimation | 0.186 |
State space estimation is the most pronounced failure mode, with LLMs often omitting numeric thresholds and basis abstractions, leading to order-of-magnitude errors. Even leading models such as Claude Opus 4.5 attain a mean of 0.601, which remains far below the exhaustive rigor guaranteed by ImandraX (Lin et al., 17 Jan 2026).
5. Case Studies in Financial and Safety-Critical Domains
ImandraX has been applied to nontrivial benchmarks in finance and safety-critical systems, where semantic rigor is paramount.
- London Stock Exchange (LSE) GTT Order Expiry:
The property "GTT order expiry and auction uncross must never coincide" () was falsified by ImandraX, which produced a counterexample in two simulation ticks at time points 2700/2701, showing simultaneous expiry and uncross. The witness state (, , ) was directly executable in IML for operational triage.
- LSE Fee Schedule Verification:
Two invariants were verified: 1. Immediate or Cancel ("IOC") orders incur exactly £0.01 more than Day orders:
- Zero-rate, zero-minimum implies zero fee:
Region decomposition uncovered six distinct regimes, including hidden premiums, floors, rebates, and negative-rate exceptions.
- Multilateral Netting Engine (Central Counterparty):
Legislated invariants included zero-sum conservation () and netting efficiency (). ImandraX detected critical bugs due to negative-amount inputs and floating-point drift before switching to arbitrary-precision types discharged all verification goals.
These cases exemplify ImandraX's capability for automated triage, invariant verification, and exhaustive decomposition, supplying counterexamples and witnesses with direct operational relevance (Lin et al., 17 Jan 2026).
6. Significance and Implications
ImandraX provides the robust HOL reasoning engine underpinning neuro-symbolic program analysis pipelines, notably in domains demanding mathematical rigor. The demonstrated 41–47 percentage-point increase in reasoning accuracy over state-of-the-art LLMs alone underscores the indispensability of formal augmentation for tasks involving state space enumeration, decision boundary identification, and precise edge case analysis.
A plausible implication is that production-grade reasoning engines such as ImandraX will remain essential to scaling autonomous software analysis in mission-critical areas, particularly as LLMs show persistent failure modes in exhaustive semantic coverage and boundary clarity. The code-logic-bench benchmark quantitatively demonstrates that neuro-symbolic integration substantially outperforms purely neural approaches for actionable software logic understanding (Lin et al., 17 Jan 2026).