Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 91 tok/s Pro

Kimi K2 178 tok/s Pro

GPT OSS 120B 385 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Deployable Planners: B2T, RoH, and ISR

Updated 16 September 2025

Deployable planners are defined as operational decision metrics (B2T, RoH, and ISR) that assess information sufficiency to manage LLM outputs effectively.
They utilize rigorous theoretical insights, including the Expectation-level Decompression Law and Quantified Martingale Violation, to predict and control hallucination risks.
Empirical findings show that permutation mixtures and calibrated information dosing lead to reduced hallucination rates and enhanced real-world reliability.

Deployable planners in the B2T (Bits-to-Trust), RoH (Risk-of-Hallucination), and ISR (Information Sufficiency Ratio) framework refer to operational decision metrics and real-time control rules for automatic answer/abstain adjudication in deployed LLMs. These planners are grounded in rigorous information-theoretic results relating model compression, reliability, and decision risk. In particular, the EDFL (Expectation-level Decompression Law) and the derived Quantified Martingale Violation (QMV) bound provide both the theoretical justification and empirical procedures for deploying these planners to mitigate hallucination. Hallucinations are reinterpreted as predictable failures of information compression, and the planners enable calibrated abstention protocols for assurance in real-world settings (Chlon et al., 14 Sep 2025).

1. Theoretical Foundations: Information Budgeting and Compression Failure

The key theoretical basis is the realization that LLMs perform near-Bayesian inference only in expectation over input order, not for individual realizations. Formally, transformers minimize the expected conditional description length over permutations, $\mathbb{E}_\pi[\ell(Y \mid \Gamma_\pi(X))]$ , implying a Kolmogorov-complexity view over orderings, not the permutation-invariant %%%%1%%%%. This "Bayesian in expectation" property introduces predictable deviations (permutation dispersion) in output likelihoods.

The Expectation-level Decompression Law (EDFL) provides an operational bound: $\bar{\Delta} \ge KL(Ber(p) \| Ber(\bar{q}))$ where $\bar{\Delta}$ is the observed information budget, $KL$ is the Kullback–Leibler divergence between Bernoulli distributions capturing the target reliability $p$ and the average prior $\bar{q}$ , and $Ber(\cdot)$ denotes a Bernoulli.

When $\bar{\Delta}$ falls short of the required KL-divergence to achieve the target posterior $p$ , hallucinations manifest as predictable compression failures. Deployable planners operationalize this insight by dynamic monitoring of signal-to-reliability via information budgeting.

2. Planner Definitions: B2T, RoH, and ISR

Three core planners are defined, each with explicit mathematical formulations:

Planner	Definition	Operational Use
B2T (Bits-to-Trust)	$B2T(x; h^) = KL(Ber(1-h^) \\| Ber(q_{lo}(x)))$	Information (nats) needed to upgrade lowest prior to target reliability $h^*$
RoH (Risk-of-Hallucination)	$RoH(x) = 1 - p_{\max}(\bar{\Delta}(x), \bar{q}(x))$	Predicts current error/hallucination risk given information budget
ISR (Information Sufficiency Ratio)	$ISR(x) = \frac{\bar{\Delta}(x)}{B2T(x; h^*)}$	Ratio of observed to required information; main decision gate

$q_{lo}(x)$ is the least favorable prior for the binary predicate (e.g., answer/abstain), evaluated over permutation ensemble or mixture.
$p_{\max}(\bar{\Delta}, \bar{q})$ is the maximum achievable posterior trust given average available information and prior.

A hard threshold $ISR = 1.0$ implements a deployable refusal policy: if ISR < 1, the planner abstains; if ISR ≥ 1, it answers.

3. Quantified Martingale Violation and Permutation Dispersion

The framework identifies permutation sensitivity as a main source of unreliability, due to transformers’ lack of strict permutation invariance. The Quantified Martingale Violation (QMV) theorem bounds the dispersion: $\mathbb{E}_\pi |R_\pi(x)| \le \frac{C}{4}(\log n - \frac{3}{2} + o(1))$ where $R_\pi(x) = q_\pi(x) - \bar{q}(x)$ is the deviation in predicted probability for permutation $\pi$ versus uniform mixture, and $n$ is the number of input evidence pieces.

Empirically, permutation dispersion follows $a + b \ln n$ scaling ( $b\approx 0.377$ for Qwen2-7B, $b\approx 0.147$ for Llama-3.1-8B). This property motivates the use of permutation mixtures—averaging predictions over random input orderings—to stabilize prior estimates and enhance reliability for the ISR calculation.

4. Empirical Results: Mixtures, Dose-Response, and Calibrated Refusal

Two principal empirical findings validate these planners:

Permutation Mixtures Improve Accuracy: Ensemble over input orderings yields higher ground-truth likelihoods and accuracy, confirmed by positive Jensen gaps between mixture predictions and mean single-permutation probabilities. The accuracy improvements (e.g., ~6% lift in datasets with Qwen2–7B) track with reduced permutation bias.
Dose-Response Reduces Hallucinations: Varying the information “dose” (amount of evidence provided) while keeping prompt length static demonstrates a linear increase in information budget, $\bar{\Delta}$ , and corresponding linear decrease in hallucination rates (~0.13 per additional nat), confirming the EDFL’s quantitative predictions.

A real audit using a fixed ISR threshold (ISR = 1.0, with target $h^*=0.92$ ) led to near-zero hallucinations at an abstention rate of 24%, demonstrating practical efficacy.

5. Operational Decision Protocols and Deployment

Deployable planners enable LLMs and associated NLP systems to admit or refuse an answer based on objective, theoretically-justified information constraints:

Compute $\bar{\Delta}$ for a candidate prompt via cross-entropy and permutation averaging.
Calculate $B2T$ for the target reliability $h^*$ and estimated $q_{lo}(x)$ .
Derive $ISR$ . If $ISR < 1$ , withhold response or seek further evidence; if $ISR \ge 1$ , permit the answer.

These rules are implementable in real time and are agnostic to the specific input domain, provided permutation ensembles can be generated. Practitioners may define $h^*$ according to application-specific risk tolerances.

6. Applicability and Generalization

This framework is directly pertinent to any setting where reliable automatic answer/abstain adjudication is required, especially in high-stakes domains (e.g., scientific QA systems, medical LLMs, or automated compliance audits). The information budgeting logic is model-agnostic and can be married to any subcomponent with well-calibrated likelihood outputs.

A plausible implication is that as model scale increases, the monotonic scaling of permutation dispersion and the robust performance of the planners (as per the QMV bound and empirical fit) will hold, although model-specific calibration is required for new architectures.

7. Significance and Limitations

The main significance is the transformation of hallucination from an opaque, “emergent” model pathology into a tractable, monitorable consequence of insufficient information budgeting. Planners operationalize this diagnosis into deployable, real-time abstention policies, offering principled reliability control. Notably, the approach makes no unfounded assumptions about human trust or error cost and achieves near-optimal risk reduction for a specified information budget.

A limitation is that calculating permutation mixtures is computationally intensive for large input sets; however, empirical results show that a relatively small number of permutations suffices for stability given the logarithmic scaling of dispersion. Also, the framework presumes reliable entropic estimation from the model, raising questions for models susceptible to sampling artifacts in very low-probability regimes.

Summary Table: Deployable Planners (B2T/RoH/ISR) in LLM Reliability

Planner	Definition	Primary Role in Deployment
B2T	$B2T(x; h^) = KL(Ber(1-h^) \\| Ber(q_{lo}(x)))$	Computes information required for target reliability
RoH	$RoH(x) = 1 - p_{\max}(\bar{\Delta}(x), \bar{q}(x))$	Estimates residual hallucination risk from available information
ISR	$ISR(x) = \bar{\Delta}(x)/B2T(x; h^*)$	Governs answer/abstain decision; ISR ≥ 1 authorizes answer

This suite of deployable planners provides an actionable, mathematically principled scaffold for managing LLM outputs in risk-sensitive real-world deployments, turning hallucinations into measurable, controllable compression failures and enabling dynamic, data-driven reliability policies (Chlon et al., 14 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Predictable Compression Failures: Why Language Models Actually Hallucinate (2025)

Follow Topic

Get notified by email when new papers are published related to Deployable Planners (B2T/RoH/ISR).

Deployable Planners: B2T, RoH, and ISR

1. Theoretical Foundations: Information Budgeting and Compression Failure

2. Planner Definitions: B2T, RoH, and ISR

3. Quantified Martingale Violation and Permutation Dispersion

4. Empirical Results: Mixtures, Dose-Response, and Calibrated Refusal

5. Operational Decision Protocols and Deployment

6. Applicability and Generalization

7. Significance and Limitations

Summary Table: Deployable Planners (B2T/RoH/ISR) in LLM Reliability

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Deployable Planners: B2T, RoH, and ISR

1. Theoretical Foundations: Information Budgeting and Compression Failure

2. Planner Definitions: B2T, RoH, and ISR

3. Quantified Martingale Violation and Permutation Dispersion

4. Empirical Results: Mixtures, Dose-Response, and Calibrated Refusal

5. Operational Decision Protocols and Deployment

6. Applicability and Generalization

7. Significance and Limitations

Summary Table: Deployable Planners (B2T/RoH/ISR) in LLM Reliability

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research