Papers
Topics
Authors
Recent
Search
2000 character limit reached

Epiplexity: Computational Information Theory

Updated 7 January 2026
  • Epiplexity is a formal framework that quantifies data’s structural content using time-bounded computation, distinguishing learnable patterns from random noise.
  • It integrates explicit computation limits into information theory to resolve paradoxes related to deterministic transformations and data order.
  • The framework provides actionable insights for data selection and curriculum design, optimizing training processes under fixed compute budgets.

Epiplexity is a formalization of informational content that quantifies what structural knowledge can be extracted from data by computationally bounded learners, distinguishing it from unstructured unpredictability that conventional information theory cannot separate. Unlike Shannon entropy or Kolmogorov complexity, which measure information under the assumption of unbounded computation, epiplexity introduces explicit time constraints, thus aligning information content with the actual capabilities of learning systems. This framework resolves longstanding paradoxes in information theory, guides principled data selection, and provides a rigorous basis for analyzing the relationship between data structure, computational constraints, and learnability (Finzi et al., 6 Jan 2026).

1. Motivation and Conceptual Foundations

Traditional information-theoretic approaches, including Shannon entropy H(X)H(X) and Kolmogorov complexity K(x)K(x), assume an observer with unbounded computational capacity. This results in three paradoxes in modern learning applications:

  • Paradox 1: Information cannot be increased by deterministic transformations: The data-processing inequality (H(f(X))H(X)H(f(X)) \leq H(X)) and K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1) suggest deterministic pipelines cannot introduce new structure, yet practical learning often extracts useful patterns via synthetic procedures, pseudorandom generation, and emergent phenomena in deterministic dynamical systems.
  • Paradox 2: Information is independent of data order and factorization: Both entropy and Kolmogorov complexity are symmetric with respect to order, while neural models, cryptographic constructions, and sequential data exhibit direction-sensitive learnability.
  • Paradox 3: Likelihood modeling is merely distribution matching: The minimizer of minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ] is P=QP=Q, which treats model learning as trivial matching, inconsistent with the empirical emergence of powerful inductive shortcuts and representations.

All these paradoxes stem from neglecting computational bounds—collapsing all decodable structure as equally trivial, even if adaptation would require infeasible computation.

2. Formal Definition of Epiplexity

Epiplexity explicitly incorporates computation time in measuring information content. Given a universal prefix Turing machine U\mathcal U and a time-constructible bound T(n)T(n), define PT\mathcal P_T as the set of all programs P\mathrm P that, in at most T(n)T(n) steps, can evaluate probabilities and sample outputs for binary strings of length nn.

The time-bounded two-part code minimizer is

P=argminPPT{P+EX[logP(X)]}\mathrm{P}^* = \arg\min_{\mathrm{P} \in \mathcal{P}_T} \left\{ |\mathrm{P}| + \mathbb{E}_{X}[ -\log \mathrm{P}(X) ] \right\}

where P|\mathrm{P}| is the length of the description of P\mathrm{P}.

Define: Epiplexity:ST(X):=P, Time-bounded entropy:HT(X):=EX[logP(X)].\boxed{ \begin{aligned} \text{Epiplexity:} \quad & S_T(X) := |\mathrm{P}^*|, \ \text{Time-bounded entropy:} \quad & H_T(X) := \mathbb{E}_X[ -\log \mathrm{P}^*(X) ]. \end{aligned} }

ST(X)S_T(X) quantifies the minimal program description length (structural content) a TT-bounded learner must absorb to model XX. HT(X)H_T(X) measures residual unpredictability under this best model. Increasing available computation (raising TT) can strictly decrease both, as more structure becomes recoverable.

3. Key Properties and Paradox Resolution

Epiplexity exhibits several properties that resolve the paradoxes noted in conventional theory:

  • Nonnegativity and boundedness: 0ST(X),HT(X), ST(X)+HT(X)n+O(1)0 \leq S_T(X), H_T(X),\ S_T(X) + H_T(X) \leq n + O(1).
  • Monotonicity in compute: If TTT' \geq T, ST(X)+HT(X)ST(X)+HT(X)S_{T'}(X) + H_{T'}(X) \leq S_T(X) + H_T(X).
  • Deterministic transformations may increase epiplexity: For a cryptographically secure PRG GG mapping kk to nn bits:

Hpoly(G(Uk))Hpoly(Uk)nk,Spoly(G(Uk))=O(1)H_{\text{poly}}(G(U_k)) - H_{\text{poly}}(U_k) \approx n - k, \quad S_{\text{poly}}(G(U_k)) = O(1)

Thus, PRG output appears random to any poly-time observer, with increased time-bounded entropy but no increase in structural content.

  • Order-dependence: For one-way permutations ff, modeling (X,f(X))(X, f(X)) and (f(X),X)(f(X), X) yields very different STS_T and HTH_T values. Predicting chess boards from moves is easy, but inverting (moves from board) inflates STS_T, aligning with model performance.
  • Computational structure creation: Models trained via maximum likelihood under finite compute can invent algorithms and inductive shortcuts not required by the true data generator.

4. Illustrative Examples

Multiple synthetic domains demonstrate epiplexity’s discriminative power:

Setting STS_T HTH_T
ECA Rule 15 (periodic) O(1)O(1) O(1)O(1)
ECA Rule 30 (chaotic) O(1)O(1) nn
ECA Rule 54 (emergent structure) Ω(nγ)\Omega(n^\gamma) (γ>0\gamma > 0) O(n)O(n)
Game of Life, one-step evolution O(1)O(1) -
Game of Life, multi-step (knk \gg n) STO(1)S_T \gg O(1), depending on structures -
Masked Markov chain (easy induction) STS_T peaks for $0 < h < 8$ -
Masked Rule 30 (hard induction) Ω(h)\Omega(h) Converges to hh

Periodic and trivial evolutions yield low epiplexity, while chaotic or unpredictable processes are noise, with structureless randomness and high HTH_T. Emergent or “inductive” domains require programmatic structure to model efficiently—reflected in growing STS_T.

5. Practical Estimation Schemes

Direct optimization over all TT-bounded programs is infeasible. In practice, ST(X)S_T(X) and HT(X)H_T(X) are estimated via parametric families under compute budgets:

  • Prequential Coding (AUC heuristic): Sequentially train a model on data Z1,,ZMZ_1, \ldots, Z_M, track per-step log-loss i\ell_i, then compute

S^preq=i=1M(iM),H^preq=MM\widehat{S}_{\text{preq}} = \sum_{i=1}^M (\ell_i - \ell_M), \quad \widehat{H}_{\text{preq}} = M \ell_M

Optimizing (N,D)(N, D) (model size, tokens) under a time constraint traces out the compute-optimal two-part code.

  • Requential Coding (Teacher–Student KL): Maintain a sequence of “teacher” models PitP^{\text{t}}_i; train “student” PisP^{\text{s}}_i on synthetic teacher samples. For each token, code cost is KL(PitPis)+O(1)\mathrm{KL}(P^{\text{t}}_i \Vert P^{\text{s}}_i) + O(1). Summing yields Preqi=1MKL(PitPis)|\mathrm{P}_{\text{req}}| \approx \sum_{i=1}^M \mathrm{KL}(P^{\text{t}}_i \Vert P^{\text{s}}_i).

Prequential estimates are computationally cheaper, while requential coding provides a tighter upper bound. The compute-optimal tradeoff is found by sweeping (N,D)(N, D) and considering the lower convex hull in (code length, compute)(\text{code length},\ \text{compute}) space.

6. Empirical Characterization Across Domains

Empirical results under fixed compute budgets (6×1018\sim 6 \times 10^{18} FLOPs, up to 5 billion tokens) reveal:

  • OpenWebText (language): ST107S_T \approx 10^7 nats, HT1010H_T \approx 10^{10} nats.
  • Chess PGN: ST106S_T \approx 10^6 nats.
  • CIFAR-5M (pixels): ST104S_T \approx 10^4 nats; almost all content is unpredictable noise.

Scaling to budgets of 102510^{25} FLOPs and 1 trillion tokens, language retains the greatest structural epiplexity, with visual and video data trailing significantly.

Epiplexity correlates with practical performance. For instance, reordering chess (board-to-moves) results in higher STS_T and better zero-shot transfer. Adaptive Data Optimization for LLM pretraining (Jiang et al., 2025) increases prequential epiplexity, yielding superior out-of-distribution generalization on multiple benchmarks.

7. Implications for Data Selection and Learning

Epiplexity inverts the model-centric view typical of Minimum Description Length and related criteria. Rather than minimizing model code for a fixed dataset, it asks which data (under a fixed compute budget) induces the largest reusable structure in a learner:

  • Data with higher STS_T contains richer, reusable “circuits” (Editor’s term), fostering transfer and generalization.
  • Relying solely on in-distribution loss may select data that is merely entropic or redundant.
  • Maximizing STS_T suggests new strategies for curriculum design, synthetic data generation, or curation, tailored to the concrete computational limits of a learning system.

A plausible implication is that epiplexity quantifies “learning potential” under budget constraints and gives a principled metric for evaluating and selecting training corpora in large-scale machine learning.


Epiplexity and its associated time-bounded entropy provide a comprehensive framework for measuring information as a resource relative to computational constraints, resolving longstanding limitations of classical theory and aligning data-centric learning with the realities of modern AI system design (Finzi et al., 6 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Epiplexity.