Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Deployable Planners: B2T, RoH, and ISR

Updated 16 September 2025
  • Deployable planners are defined as operational decision metrics (B2T, RoH, and ISR) that assess information sufficiency to manage LLM outputs effectively.
  • They utilize rigorous theoretical insights, including the Expectation-level Decompression Law and Quantified Martingale Violation, to predict and control hallucination risks.
  • Empirical findings show that permutation mixtures and calibrated information dosing lead to reduced hallucination rates and enhanced real-world reliability.

Deployable planners in the B2T (Bits-to-Trust), RoH (Risk-of-Hallucination), and ISR (Information Sufficiency Ratio) framework refer to operational decision metrics and real-time control rules for automatic answer/abstain adjudication in deployed LLMs. These planners are grounded in rigorous information-theoretic results relating model compression, reliability, and decision risk. In particular, the EDFL (Expectation-level Decompression Law) and the derived Quantified Martingale Violation (QMV) bound provide both the theoretical justification and empirical procedures for deploying these planners to mitigate hallucination. Hallucinations are reinterpreted as predictable failures of information compression, and the planners enable calibrated abstention protocols for assurance in real-world settings (Chlon et al., 14 Sep 2025).

1. Theoretical Foundations: Information Budgeting and Compression Failure

The key theoretical basis is the realization that LLMs perform near-Bayesian inference only in expectation over input order, not for individual realizations. Formally, transformers minimize the expected conditional description length over permutations, Eπ[(YΓπ(X))]\mathbb{E}_\pi[\ell(Y \mid \Gamma_\pi(X))], implying a Kolmogorov-complexity view over orderings, not the permutation-invariant %%%%1%%%%. This "Bayesian in expectation" property introduces predictable deviations (permutation dispersion) in output likelihoods.

The Expectation-level Decompression Law (EDFL) provides an operational bound: ΔˉKL(Ber(p)Ber(qˉ))\bar{\Delta} \ge KL(Ber(p) \| Ber(\bar{q})) where Δˉ\bar{\Delta} is the observed information budget, KLKL is the Kullback–Leibler divergence between Bernoulli distributions capturing the target reliability pp and the average prior qˉ\bar{q}, and Ber()Ber(\cdot) denotes a Bernoulli.

When Δˉ\bar{\Delta} falls short of the required KL-divergence to achieve the target posterior pp, hallucinations manifest as predictable compression failures. Deployable planners operationalize this insight by dynamic monitoring of signal-to-reliability via information budgeting.

2. Planner Definitions: B2T, RoH, and ISR

Three core planners are defined, each with explicit mathematical formulations:

Planner Definition Operational Use
B2T (Bits-to-Trust) B2T(x;h)=KL(Ber(1h)Ber(qlo(x)))B2T(x; h^*) = KL(Ber(1-h^*) \| Ber(q_{lo}(x))) Information (nats) needed to upgrade lowest prior to target reliability hh^*
RoH (Risk-of-Hallucination) RoH(x)=1pmax(Δˉ(x),qˉ(x))RoH(x) = 1 - p_{\max}(\bar{\Delta}(x), \bar{q}(x)) Predicts current error/hallucination risk given information budget
ISR (Information Sufficiency Ratio) ISR(x)=Δˉ(x)B2T(x;h)ISR(x) = \frac{\bar{\Delta}(x)}{B2T(x; h^*)} Ratio of observed to required information; main decision gate
  • qlo(x)q_{lo}(x) is the least favorable prior for the binary predicate (e.g., answer/abstain), evaluated over permutation ensemble or mixture.
  • pmax(Δˉ,qˉ)p_{\max}(\bar{\Delta}, \bar{q}) is the maximum achievable posterior trust given average available information and prior.

A hard threshold ISR=1.0ISR = 1.0 implements a deployable refusal policy: if ISR < 1, the planner abstains; if ISR ≥ 1, it answers.

3. Quantified Martingale Violation and Permutation Dispersion

The framework identifies permutation sensitivity as a main source of unreliability, due to transformers’ lack of strict permutation invariance. The Quantified Martingale Violation (QMV) theorem bounds the dispersion: EπRπ(x)C4(logn32+o(1))\mathbb{E}_\pi |R_\pi(x)| \le \frac{C}{4}(\log n - \frac{3}{2} + o(1)) where Rπ(x)=qπ(x)qˉ(x)R_\pi(x) = q_\pi(x) - \bar{q}(x) is the deviation in predicted probability for permutation π\pi versus uniform mixture, and nn is the number of input evidence pieces.

Empirically, permutation dispersion follows a+blnna + b \ln n scaling (b0.377b\approx 0.377 for Qwen2-7B, b0.147b\approx 0.147 for Llama-3.1-8B). This property motivates the use of permutation mixtures—averaging predictions over random input orderings—to stabilize prior estimates and enhance reliability for the ISR calculation.

4. Empirical Results: Mixtures, Dose-Response, and Calibrated Refusal

Two principal empirical findings validate these planners:

  1. Permutation Mixtures Improve Accuracy: Ensemble over input orderings yields higher ground-truth likelihoods and accuracy, confirmed by positive Jensen gaps between mixture predictions and mean single-permutation probabilities. The accuracy improvements (e.g., ~6% lift in datasets with Qwen2–7B) track with reduced permutation bias.
  2. Dose-Response Reduces Hallucinations: Varying the information “dose” (amount of evidence provided) while keeping prompt length static demonstrates a linear increase in information budget, Δˉ\bar{\Delta}, and corresponding linear decrease in hallucination rates (~0.13 per additional nat), confirming the EDFL’s quantitative predictions.

A real audit using a fixed ISR threshold (ISR = 1.0, with target h=0.92h^*=0.92) led to near-zero hallucinations at an abstention rate of 24%, demonstrating practical efficacy.

5. Operational Decision Protocols and Deployment

Deployable planners enable LLMs and associated NLP systems to admit or refuse an answer based on objective, theoretically-justified information constraints:

  • Compute Δˉ\bar{\Delta} for a candidate prompt via cross-entropy and permutation averaging.
  • Calculate B2TB2T for the target reliability hh^* and estimated qlo(x)q_{lo}(x).
  • Derive ISRISR. If ISR<1ISR < 1, withhold response or seek further evidence; if ISR1ISR \ge 1, permit the answer.

These rules are implementable in real time and are agnostic to the specific input domain, provided permutation ensembles can be generated. Practitioners may define hh^* according to application-specific risk tolerances.

6. Applicability and Generalization

This framework is directly pertinent to any setting where reliable automatic answer/abstain adjudication is required, especially in high-stakes domains (e.g., scientific QA systems, medical LLMs, or automated compliance audits). The information budgeting logic is model-agnostic and can be married to any subcomponent with well-calibrated likelihood outputs.

A plausible implication is that as model scale increases, the monotonic scaling of permutation dispersion and the robust performance of the planners (as per the QMV bound and empirical fit) will hold, although model-specific calibration is required for new architectures.

7. Significance and Limitations

The main significance is the transformation of hallucination from an opaque, “emergent” model pathology into a tractable, monitorable consequence of insufficient information budgeting. Planners operationalize this diagnosis into deployable, real-time abstention policies, offering principled reliability control. Notably, the approach makes no unfounded assumptions about human trust or error cost and achieves near-optimal risk reduction for a specified information budget.

A limitation is that calculating permutation mixtures is computationally intensive for large input sets; however, empirical results show that a relatively small number of permutations suffices for stability given the logarithmic scaling of dispersion. Also, the framework presumes reliable entropic estimation from the model, raising questions for models susceptible to sampling artifacts in very low-probability regimes.

Summary Table: Deployable Planners (B2T/RoH/ISR) in LLM Reliability

Planner Definition Primary Role in Deployment
B2T B2T(x;h)=KL(Ber(1h)Ber(qlo(x)))B2T(x; h^*) = KL(Ber(1-h^*) \| Ber(q_{lo}(x))) Computes information required for target reliability
RoH RoH(x)=1pmax(Δˉ(x),qˉ(x))RoH(x) = 1 - p_{\max}(\bar{\Delta}(x), \bar{q}(x)) Estimates residual hallucination risk from available information
ISR ISR(x)=Δˉ(x)/B2T(x;h)ISR(x) = \bar{\Delta}(x)/B2T(x; h^*) Governs answer/abstain decision; ISR ≥ 1 authorizes answer

This suite of deployable planners provides an actionable, mathematically principled scaffold for managing LLM outputs in risk-sensitive real-world deployments, turning hallucinations into measurable, controllable compression failures and enabling dynamic, data-driven reliability policies (Chlon et al., 14 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Deployable Planners (B2T/RoH/ISR).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube