Zero-Error Horizon (ZEH)
- Zero-Error Horizon (ZEH) is a metric that defines the maximum scale for guaranteed error-free performance in systems like LLMs, numerical algorithms, and communication channels.
- It establishes an explicit boundary—the ZEH limiter—beyond which error-free operation fails, offering a clear audit trail for reliability and safety assessments.
- Empirical analyses reveal a strong correlation between ZEH and accuracy, with optimized computational techniques enabling efficient evaluation across diverse domains.
The Zero-Error Horizon (ZEH) is a rigorous metric that delineates the maximal regime where a model, communication channel, or algorithm performs without a single error under prescribed conditions. The concept captures the precise input size, problem scale, or coding block-length for which total correctness holds, providing an explicit and auditable guarantee of reliability. ZEH is foundational across several domains, including LLM trustworthy evaluation, robust numerical algorithms, and zero-error communication theory, revealing intrinsic capability boundaries and informing practical deployment strategies (Sato, 22 Jan 2026, Battaglia et al., 1 May 2025, 0911.5300).
1. Formal Definitions and Instantiations
ZEH is task- and system-dependent, but its archetypal definition is as follows:
- Let be a fixed system (e.g., a LLM with fixed prompt and decoding).
- Let denote all problem instances of size (as appropriate for the domain).
- is the collection of all instances up to size .
- is the set of instances answered (or solved, or transmitted) correctly by .
The Zero-Error Horizon is
If , all instances up to size are guaranteed error-free; there exists some 0 with a failure. The first such 1 is referred to as a ZEH limiter (Sato, 22 Jan 2026).
In communication theory, the zero-error horizon 2 for a channel 3 is the minimal block length 4 so that at least two messages can be transmitted with zero error in 5 uses:
6
where 7 denotes the classical one-shot zero-error capacity for a given channel (0911.5300).
In robust numerical methods, the error horizon 8 is the smallest ball around the true solution to which an iterative algorithm can converge in the presence of perturbations, and a zero-error horizon (i.e., 9) indicates exact recovery under certain corruption models (Battaglia et al., 1 May 2025).
2. ZEH in Trustworthy LLM Evaluation
ZEH for LLMs is operationalized as the largest input size for which a model provides universally correct answers on all instances of a canonical task, unambiguously under fixed prompt and greedy decoding.
Key empirical ZEHs for GPT-5.2 (Sato, 22 Jan 2026):
- Multiplication: ZEH = 126 (limiter: 0 incorrectly answered).
- Parity of Binary Strings: ZEH = 4 (limiter: "11000" misclassified).
- Balanced Parentheses: ZEH = 10 (limiter: "((((( ))))))" with 11 parens).
- Graph Chromatic Number: ZEH = 4 (5-vertex graph miscolored).
For Qwen2.5-Instruct, ZEH scales monotonically with model size:
| Model Size | ZEH (Multiplication) | Accuracy (1) |
|---|---|---|
| 0.5B | 0 | 55.0% |
| 1.5B | 20 | 75.9% |
| 3B | 15 | 79.3% |
| 7B | 22 | 93.2% |
| 14B | 26 | 97.1% |
| 32B | 33 | 98.6% |
| 72B | 42 | 98.6% |
Prompt-variation experiments yield 2 variation in ZEH, confirming stability.
ZEH is tightly correlated with accuracy but reveals "holes" (i.e., error outliers) invisible to mean performance metrics, providing concrete counterexamples for auditability and baseline safety. Emergent algorithmic behaviors are mirrored in ZEH growth: small models exhibit unpredictable failures (memory-based), whereas large models show structured errors (e.g., multiplication carry mistakes), with logistic regression quantifying improved carry robustness with increasing size. Spearman correlations quantify the decoupling between rote corpus memorization and algorithmic generalization, with ZEH strongly indicating the latter (Sato, 22 Jan 2026).
3. ZEH in Robust Numerical Linear Algebra
In iterative algorithms for linear equations (e.g., randomized Kaczmarz), the classical error horizon 3 determines the residual error ball radius around the true solution under corruptions:
4
where 5 is the matrix condition number and 6 the corruption vector.
Quantile-based variants (qRK, dqRK) provide strict error-horizon reductions. Defining
7
where 8 encapsulate spectral and quantile-related terms, and 9 is a dense “small” noise component. Zero-error horizon conditions hold when 0 and the fraction of sparse corruptions is within quantile exclusion, i.e., 1. No analogous strict ZEH exists for classical RK unless there is zero corruption (Battaglia et al., 1 May 2025).
This yields robust convergence against arbitrarily large sparse corruption, with empirical results demonstrating that the error horizon remains small and stable for quantile-based methods but explodes for classical RK as corruption increases.
4. ZEH in Zero-Error Information Theory
In classical and quantum communication, the ZEH (often labeled zero‐error horizon 2) encapsulates the minimum block-length needed for nonzero error-free capacity. The underlying machinery is the channel’s confusability graph 3, with one-shot zero-error capacity 4 (independence number).
Formally,
5
Entanglement- and more generally, non-signalling-assisted schemes can strictly reduce the zero-error horizon:
- For certain constructions (e.g., Bell–Kochen–Specker channels), classically 6, while sharing entanglement yields 7.
- Non-signalling resources can enable 8 even where 9, as determined by the hypergraph fractional packing condition 0 (0911.5300).
This demonstrates that ZEH is not intrinsic to the bare channel but contingent on available shared resources (shared randomness, entanglement, NS correlations), providing a unifying framework for resource-oriented separations in zero-error tasks.
5. Algorithmic and Computational Aspects
ZEH measurement commonly involves exhaustive enumeration:
- Fix system (e.g., LLM+prompt+decoding), task, and input-size definition.
- For 1:
- Enumerate all 2.
- Apply 3 to 4; validate outcome.
- If an error occurs, 5 is the ZEH, and the first failing 6 is the ZEH limiter (Sato, 22 Jan 2026).
For large-scale problems (e.g., LLMs), direct enumeration is computationally prohibitive. Several optimizations are available:
- Teacher Forcing: Short-circuits token-by-token decoding.
- Lookahead Batching: Batches instances by size for GPU utilization and early exit on failures.
- Prompt KV-Cache Prefilling: Re-uses context attention cache across many instances sharing a prompt.
- Tree-Structured Decoding (FlashTree): Collapses computation along shared-autoregressive answer suffixes.
- Empirical metrics: FlashTree yields up to 7 speedup over naive methods for LLM ZEH computation.
6. Implications, Limitations, and Future Directions
ZEH delivers a guaranteed boundary of all-correct performance—within ZEH, no failures occur under fixed settings; beyond it, error is certain. This allows concrete auditing via ZEH limiters and operationalizes warning signals for out-of-horizon input detection (e.g., in safety-critical pipelines prompting fallback to reliable systems or human intervention).
Limitations include:
- High sensitivity to prompt and context (valid only under fixed settings).
- Deterministic decoding requirements—stochastic decoding may yield different horizons.
- Combinatorial explosion for all but toy tasks, necessitating sampling or formal verification for realistic use cases.
- ZEH fragility (“collapse”) due to single-bug or randomness-induced brittleness.
Research directions target efficient approximate ZEH estimation (statistical/adversarial sampling), formal methods for symbolic guarantees, extension to stateful or interactive systems, and context-dependent ZEH metrics suitable for complex building blocks (multi-step reasoning, programmatic outputs) (Sato, 22 Jan 2026, Battaglia et al., 1 May 2025).
7. Comparative Perspectives and Conceptual Unification
ZEH bridges disparate fields through a common lens of guaranteed error-free range:
- In LLM evaluation, it complements accuracy by exposing singular critical failures.
- In robust linear solvers, it informs the maximal tolerable adversarial corruption before breakdown.
- In channel coding, it quantifies capacity onset under various side-resources.
The unifying principle is the focus on guaranteed total correctness, not expected or averaged performance, yielding critical insights into capability, audit, and reliability boundaries (Sato, 22 Jan 2026, Battaglia et al., 1 May 2025, 0911.5300).
References
- "Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs" (Sato, 22 Jan 2026)
- "Quantile-RK and Double Quantile-RK Error Horizon Analysis" (Battaglia et al., 1 May 2025)
- "Improving zero-error classical communication with entanglement" (0911.5300)