Papers
Topics
Authors
Recent
Search
2000 character limit reached

ImandraX: Industrial Reasoning Engine

Updated 22 January 2026
  • ImandraX is an industrial automated reasoning engine that uses higher-order logic and exhaustive input decomposition for formal software verification.
  • It integrates bounded model checking, complete decision procedures, and Boyer-Moore induction to generate counterexamples and guarantee mathematical rigor.
  • Deployed in financial and safety-critical domains, ImandraX underpins neuro-symbolic pipelines like CodeLogician to deliver actionable verification insights.

ImandraX is an industrial automated reasoning engine designed for the precise analysis of software logic, serving as the rigorous formal backend for neuro-symbolic program understanding pipelines such as CodeLogician. Deployed in financial markets and safety-critical systems, ImandraX integrates higher-order logical inference, exhaustive input decomposition, formal verification, and automated test synthesis to deliver mathematical guarantees for complex software models (Lin et al., 17 Jan 2026).

1. Logical Foundations and Architecture

ImandraX operates on models expressed in the Imandra Modeling Language (IML), a pure-functional, statically-typed subset of OCaml. IML is equipped with built-in constructs for verification (verify, lemma) and region decomposition. The underlying formal semantics of IML are mechanized in higher-order logic (HOL).

Inference in ImandraX leverages several complementary formal techniques:

  • Bounded Model Checking: Recursively unrolls definitions for concrete counterexample generation in bounded domains.
  • Complete Decision Procedures: Targets linear arithmetic, array and map theory, and selected data types for efficient, complete logical reasoning.
  • Typed Boyer-Moore Induction Engine: Supports unbounded proofs via a formal induction “waterfall,” enabling verification over recursive and infinite state spaces.

A Hindley–Milner–style static type system governs typing and evaluation, illustrated by: $\infer[\mathtt{T\text{-}Fun}]{\Gamma \vdash (\texttt{fun}\;x:\tau_1\mapsto e):\tau_1\to\tau_2} {\Gamma,\;x:\tau_1\;\vdash\;e:\tau_2} \quad \infer[\mathtt{T\text{-}App}]{\Gamma \vdash f\,e:\tau_2} {\Gamma\vdash f:\tau_1\to\tau_2 &\;\Gamma\vdash e:\tau_1}$

State machines are treated as first-class entities. Imperative loops over state SS are modeled by recursive functions: step:S×IS,run(s0,[i1;;in])=step(step(s0,i1),,in)\texttt{step}:S\times I\to S,\quad \texttt{run}(s_0,[i_1;\dots;i_n]) = \texttt{step}(\dots\texttt{step}(s_0,i_1),\dots,i_n)

Region decomposition partitions a function’s input space into finitely many symbolic regions, each defined by: {(Ci,Ri,Wi)}i=1Ns.t.  i{xCi(x)}=dom(f);  x.  Ci(x)    f(x)=Ri\{(C_i, R_i, W_i)\}_{i=1}^N \quad \text{s.t.}\; \bigcup_i\{x \mid C_i(x)\} = \mathrm{dom}(f);\;\forall\,x.\;C_i(x) \implies f(x)=R_i where CiC_i specifies a region via linear constraints, RiR_i is an output invariant, and WiW_i is a witness point.

2. Neuro-Symbolic Reasoning Pipeline via CodeLogician

CodeLogician uses LLMs as "autoformalizers" to translate informal source code (e.g., Python with object orientation, loops, and external calls) into pure-functional IML models suitable for precise automated analysis. The workflow involves:

  • Synthesizing an IML model M=(S,I,O,δ)M = (S, I, O, \delta), where δ:S×IS\delta: S \times I \to S is the state transition function, and τ\tau is the record type representing machine state.
  • Producing explicit formal assumptions for any external calls or opaque computations.
  • Encoding imperative loops as recursive definitions, and managing side effects with explicit state records.

Formally, given source code PP, the translation function is

F:P(τ,δ)\mathcal{F} : P \longmapsto (\tau, \delta)

Verification goals are synthesized as Boolean predicates: VG(s):boolx.ϕ(x)    ψ(x)\texttt{VG}(s): \texttt{bool} \equiv \forall x. \phi(x) \implies \psi(x) which are submitted to ImandraX for proof or counterexample finding.

For advanced semantic queries—including enumeration of distinct behaviors, identification of decision boundaries, and detection of edge cases—CodeLogician invokes region decomposition over the IML model under side conditions SC(s)\mathit{SC}(s) or basis functions BB. Each resulting region yields a witness WiW_i, supporting automated test-case generation in the source code’s native language.

3. The code-logic-bench Benchmark

The code-logic-bench is introduced to systematically evaluate mathematical reasoning about software logic, bridging the gap between specialized theorem proving and practical software engineering benchmarks.

  • Dataset: Consists of 50 mid-level models representing state machines with multi-entity interactions, temporal evolution, and decision logic. Each is paired with three core questions targeting:

    1. State space enumeration
    2. Conditional analysis
    3. Property verification
  • Ground Truth: Defined using ImandraX’s exhaustive region counts and formally-proved invariants or counterexamples.

  • Metrics: Seven complementary metrics distinguish LLM-only reasoning from CodeLogician-enhanced neuro-symbolic analysis, including:

    • State Space Estimation Accuracy (SSEA):

    SSEA=11+log2nLLMndecomp+1\mathrm{SSEA} = \frac{1}{1 + \log_2|n_{\mathrm{LLM}} - n_{\mathrm{decomp}}| + 1} - Coverage Completeness:

    Coverage=# regions identified by LLM# actual regions\mathrm{Coverage} = \frac{\text{\# regions identified by LLM}}{\text{\# actual regions}} - Outcome Precision - Direction Accuracy - Control Flow Understanding - Edge Case Detection:

    EdgeCaseScore={LLM‐found edge cases}{decomp‐found edge cases}\mathrm{EdgeCaseScore} = \frac{|\{\text{LLM‐found edge cases}\}|}{|\{\text{decomp‐found edge cases}\}|} - Decision Boundary Clarity

  • Evaluation Protocol: Five state-of-the-art LLMs answer each question with and without CodeLogician augmentation, and four independent LLMs act as rubric-based judges, with metric scores averaged.

4. Quantitative Results and Analysis

LLM-only approaches yield aggregate scores between 0.53 and 0.60 (mean across all metrics and models), while formal augmentation via CodeLogician and ImandraX achieves perfect scores (1.0 by definition), thereby closing a substantial 41–47 percentage-point gap in reasoning accuracy. Metric-specific LLM-only means include:

Metric LLM-Only Mean
Control Flow Understanding 0.746
Decision Boundary Clarity 0.695
Direction Accuracy 0.635
Outcome Precision 0.613
Edge Case Detection 0.597
Coverage Completeness 0.490
State Space Estimation 0.186

State space estimation is the most pronounced failure mode, with LLMs often omitting numeric thresholds and basis abstractions, leading to order-of-magnitude errors. Even leading models such as Claude Opus 4.5 attain a mean of 0.601, which remains far below the exhaustive rigor guaranteed by ImandraX (Lin et al., 17 Jan 2026).

5. Case Studies in Financial and Safety-Critical Domains

ImandraX has been applied to nontrivial benchmarks in finance and safety-critical systems, where semantic rigor is paramount.

  • London Stock Exchange (LSE) GTT Order Expiry:

The property "GTT order expiry and auction uncross must never coincide" (msgs,s.  ¬conflict_reachable(msgs,s)\forall\,\mathit{msgs},\,s.\;\lnot\,\texttt{conflict\_reachable}(\mathit{msgs},s)) was falsified by ImandraX, which produced a counterexample in two simulation ticks at time points 2700/2701, showing simultaneous expiry and uncross. The witness state (uncross_at=2700\mathit{uncross\_at}=2700, expires_at=2700\mathit{expires\_at}=2700, extension=1extension = 1) was directly executable in IML for operational triage.

  • LSE Fee Schedule Verification:

Two invariants were verified: 1. Immediate or Cancel ("IOC") orders incur exactly £0.01 more than Day orders:

p,q,.  calc_cost(p,q,IOC,)=calc_cost(p,q,DAY,)+0.01\forall p,q,\dots.\; \mathit{calc\_cost}(p,q,\texttt{IOC},\dots) = \mathit{calc\_cost}(p,q,\texttt{DAY},\dots) + 0.01

  1. Zero-rate, zero-minimum implies zero fee:

    v.  exec_fee(v,0.0,0.0)=0.0\forall v.\;\mathit{exec\_fee}(v,0.0,0.0) = 0.0

Region decomposition uncovered six distinct regimes, including hidden premiums, floors, rebates, and negative-rate exceptions.

  • Multilateral Netting Engine (Central Counterparty):

Legislated invariants included zero-sum conservation (ineti=0\sum_i \mathit{net}_i = 0) and netting efficiency (inetijtradej\sum_i |\mathit{net}_i| \leq \sum_j |\mathit{trade}_j|). ImandraX detected critical bugs due to negative-amount inputs and floating-point drift before switching to arbitrary-precision types discharged all verification goals.

These cases exemplify ImandraX's capability for automated triage, invariant verification, and exhaustive decomposition, supplying counterexamples and witnesses with direct operational relevance (Lin et al., 17 Jan 2026).

6. Significance and Implications

ImandraX provides the robust HOL reasoning engine underpinning neuro-symbolic program analysis pipelines, notably in domains demanding mathematical rigor. The demonstrated 41–47 percentage-point increase in reasoning accuracy over state-of-the-art LLMs alone underscores the indispensability of formal augmentation for tasks involving state space enumeration, decision boundary identification, and precise edge case analysis.

A plausible implication is that production-grade reasoning engines such as ImandraX will remain essential to scaling autonomous software analysis in mission-critical areas, particularly as LLMs show persistent failure modes in exhaustive semantic coverage and boundary clarity. The code-logic-bench benchmark quantitatively demonstrates that neuro-symbolic integration substantially outperforms purely neural approaches for actionable software logic understanding (Lin et al., 17 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ImandraX.