Epiplexity: Computational Information Theory
- Epiplexity is a formal framework that quantifies data’s structural content using time-bounded computation, distinguishing learnable patterns from random noise.
- It integrates explicit computation limits into information theory to resolve paradoxes related to deterministic transformations and data order.
- The framework provides actionable insights for data selection and curriculum design, optimizing training processes under fixed compute budgets.
Epiplexity is a formalization of informational content that quantifies what structural knowledge can be extracted from data by computationally bounded learners, distinguishing it from unstructured unpredictability that conventional information theory cannot separate. Unlike Shannon entropy or Kolmogorov complexity, which measure information under the assumption of unbounded computation, epiplexity introduces explicit time constraints, thus aligning information content with the actual capabilities of learning systems. This framework resolves longstanding paradoxes in information theory, guides principled data selection, and provides a rigorous basis for analyzing the relationship between data structure, computational constraints, and learnability (Finzi et al., 6 Jan 2026).
1. Motivation and Conceptual Foundations
Traditional information-theoretic approaches, including Shannon entropy and Kolmogorov complexity , assume an observer with unbounded computational capacity. This results in three paradoxes in modern learning applications:
- Paradox 1: Information cannot be increased by deterministic transformations: The data-processing inequality () and suggest deterministic pipelines cannot introduce new structure, yet practical learning often extracts useful patterns via synthetic procedures, pseudorandom generation, and emergent phenomena in deterministic dynamical systems.
- Paradox 2: Information is independent of data order and factorization: Both entropy and Kolmogorov complexity are symmetric with respect to order, while neural models, cryptographic constructions, and sequential data exhibit direction-sensitive learnability.
- Paradox 3: Likelihood modeling is merely distribution matching: The minimizer of is , which treats model learning as trivial matching, inconsistent with the empirical emergence of powerful inductive shortcuts and representations.
All these paradoxes stem from neglecting computational bounds—collapsing all decodable structure as equally trivial, even if adaptation would require infeasible computation.
2. Formal Definition of Epiplexity
Epiplexity explicitly incorporates computation time in measuring information content. Given a universal prefix Turing machine and a time-constructible bound , define as the set of all programs that, in at most steps, can evaluate probabilities and sample outputs for binary strings of length .
The time-bounded two-part code minimizer is
where is the length of the description of .
Define:
quantifies the minimal program description length (structural content) a -bounded learner must absorb to model . measures residual unpredictability under this best model. Increasing available computation (raising ) can strictly decrease both, as more structure becomes recoverable.
3. Key Properties and Paradox Resolution
Epiplexity exhibits several properties that resolve the paradoxes noted in conventional theory:
- Nonnegativity and boundedness: .
- Monotonicity in compute: If , .
- Deterministic transformations may increase epiplexity: For a cryptographically secure PRG mapping to bits:
Thus, PRG output appears random to any poly-time observer, with increased time-bounded entropy but no increase in structural content.
- Order-dependence: For one-way permutations , modeling and yields very different and values. Predicting chess boards from moves is easy, but inverting (moves from board) inflates , aligning with model performance.
- Computational structure creation: Models trained via maximum likelihood under finite compute can invent algorithms and inductive shortcuts not required by the true data generator.
4. Illustrative Examples
Multiple synthetic domains demonstrate epiplexity’s discriminative power:
| Setting | ||
|---|---|---|
| ECA Rule 15 (periodic) | ||
| ECA Rule 30 (chaotic) | ||
| ECA Rule 54 (emergent structure) | () | |
| Game of Life, one-step evolution | - | |
| Game of Life, multi-step () | , depending on structures | - |
| Masked Markov chain (easy induction) | peaks for $0 < h < 8$ | - |
| Masked Rule 30 (hard induction) | Converges to |
Periodic and trivial evolutions yield low epiplexity, while chaotic or unpredictable processes are noise, with structureless randomness and high . Emergent or “inductive” domains require programmatic structure to model efficiently—reflected in growing .
5. Practical Estimation Schemes
Direct optimization over all -bounded programs is infeasible. In practice, and are estimated via parametric families under compute budgets:
- Prequential Coding (AUC heuristic): Sequentially train a model on data , track per-step log-loss , then compute
Optimizing (model size, tokens) under a time constraint traces out the compute-optimal two-part code.
- Requential Coding (Teacher–Student KL): Maintain a sequence of “teacher” models ; train “student” on synthetic teacher samples. For each token, code cost is . Summing yields .
Prequential estimates are computationally cheaper, while requential coding provides a tighter upper bound. The compute-optimal tradeoff is found by sweeping and considering the lower convex hull in space.
6. Empirical Characterization Across Domains
Empirical results under fixed compute budgets ( FLOPs, up to 5 billion tokens) reveal:
- OpenWebText (language): nats, nats.
- Chess PGN: nats.
- CIFAR-5M (pixels): nats; almost all content is unpredictable noise.
Scaling to budgets of FLOPs and 1 trillion tokens, language retains the greatest structural epiplexity, with visual and video data trailing significantly.
Epiplexity correlates with practical performance. For instance, reordering chess (board-to-moves) results in higher and better zero-shot transfer. Adaptive Data Optimization for LLM pretraining (Jiang et al., 2025) increases prequential epiplexity, yielding superior out-of-distribution generalization on multiple benchmarks.
7. Implications for Data Selection and Learning
Epiplexity inverts the model-centric view typical of Minimum Description Length and related criteria. Rather than minimizing model code for a fixed dataset, it asks which data (under a fixed compute budget) induces the largest reusable structure in a learner:
- Data with higher contains richer, reusable “circuits” (Editor’s term), fostering transfer and generalization.
- Relying solely on in-distribution loss may select data that is merely entropic or redundant.
- Maximizing suggests new strategies for curriculum design, synthetic data generation, or curation, tailored to the concrete computational limits of a learning system.
A plausible implication is that epiplexity quantifies “learning potential” under budget constraints and gives a principled metric for evaluating and selecting training corpora in large-scale machine learning.
Epiplexity and its associated time-bounded entropy provide a comprehensive framework for measuring information as a resource relative to computational constraints, resolving longstanding limitations of classical theory and aligning data-centric learning with the realities of modern AI system design (Finzi et al., 6 Jan 2026).