Information-Theoretic Foundations

Updated 28 September 2025

Information-Theoretic Foundations is a rigorous framework that defines information through axiomatic, logical, and Bayesian methods.
It employs mathematical tools like entropy, KL divergence, and mutual information to quantify uncertainty and transformation in diverse systems.
The framework underpins applications in communication, machine learning, and physics, guiding model selection and efficient information processing.

Information-theoretic foundations form the conceptual and mathematical core of information theory, addressing the precise definition, properties, and mechanisms of information itself. These foundations underlie virtually all modern information-centric disciplines, including communication theory, machine learning, statistical inference, coding, physics, and the life sciences. Foundational research formalizes what "information" is, derives its key mathematical expressions, and elucidates how it manifests in diverse systems using rigorous, often axiomatic, frameworks. Several complementary paradigms exist: classical axiomatic approaches, logical/partition-based views, and information-theoretic treatments grounded in Bayesian and operational perspectives.

1. Ontological and Axiomatic Foundations

Burgin’s general theory of information (0808.0768) introduces ontological principles that establish information as a system-relative, transformative capacity rather than an absolute or purely syntactic quantity. The central claims are as follows:

Locality Principle (O1): Information is always defined relative to a given receiver $R$ ; it is not an intrinsic property of a message or carrier but a function of the effect on the recipient’s "infological system" $\mathrm{IF}(R)$ .
Transformation Principle (O2): Information for $R$ is the capacity to cause change in $\mathrm{IF}(R)$ . Symbolically,

$I = \Delta \mathrm{IF}(R)$

Here, information is defined analogously to energy as "capacity to do work"; it is "potential for change" in structural, cognitive, or other infological terms.

Carrier, Representation, and Interaction (O3–O5): Information always has a representation or carrier $C$ , and an interaction $(C, I, R)$ is required for information transmission or reception.
Dynamic and Multiplicity Principles (O6–O7): Information is accepted only when actual change occurs, and the same carrier can convey different information to the same receiver depending on its state.

A mathematical unification based on these principles draws analogy between information and thermodynamic entropy, as in

$I = K \cdot \ln N$

where $N$ is the number of distinguishable states or units of information (akin to the Boltzmann–Planck entropy formula $S = k \ln P$ ), and $K$ is a proportionality constant. This parametric form allows modeling information in cognitive, physical, or material contexts.

2. Logical and Partition-Based Foundations

Ellerman’s partition logic view (Ellerman, 2013) reconstructs information-theoretic quantities using the duality between subset logic (classical probability) and partition logic:

Distinction ("dit") as Atom of Information: While subsets are characterized by elements, partitions are characterized by distinctions: ordered pairs of elements in different blocks.
Logical Entropy: Defined as the normalized count of distinctions in a partition $\pi$ :

$h(\pi) = \frac{|\text{dit}(\pi)|}{|U \times U|} = 1 - \sum_{B \in \pi} p_B^2$

Compound Quantities via Set Operations: Join entropy, mutual information, and conditional entropy are constructed by analogues of Venn diagram set operations on dit-sets.
Relationship to Shannon Entropy: Logical entropy leads to a "dit-to-bit" conversion, with the Shannon entropy

$H(p) = \sum_i p_i \log_2 (1/p_i)$

interpreted as the mean number of binary partitions needed to achieve all distinctions. The logical framework generalizes to joint and conditional entropies, extending the foundational underpinnings of Shannon's original axiomatization.

3. Divergence, Entropy, and Mutual Information

The standard mathematical foundation (Chodrow, 2017, Stone, 2018) treats the Kullback-Leibler (KL) divergence as primitive: $D_{\mathrm{KL}}(p \| q) = \sum_x p(x) \log \frac{p(x)}{q(x)}$ which measures the inefficiency or "surprise" when using distribution $q$ in place of the true $p$ . Entropy arises as the divergence from uniformity: $H(p) = -\sum_x p(x) \log p(x) = \log m - D_{\mathrm{KL}}(p \| u)$ for a uniform $u$ . Key conceptual tools arising from this perspective include:

Mutual Information:

$I(X; Y) = D_{\mathrm{KL}}(p(x, y) \| p(x)p(y))$

which quantifies dependence or the reduction in uncertainty about $X$ after observing $Y$ .

Data Processing Inequality: If $X \rightarrow Y \rightarrow Z$ is a Markov chain, then

$I(X; Z) \leq I(X; Y)$

expressing that transformations or "garbling" cannot increase information content about the original source (mirroring the second law of thermodynamics for entropy).

Chernoff Bounds and Gibbs’ Inequality: These results tie KL divergence to exponential rates in large-deviation probabilities and ensure nonnegativity of information measures.

4. Bayesian and Rational Belief Foundations

Modern work (Duersch et al., 2019, Jeon et al., 17 Jul 2024) frames information as quantifying the evolution of rational belief, fully compatible with Bayesian updating:

Generalized Information Measures: Any "reasonable" information measure between prior $Q_0$ and posterior $Q_1$ , under "view" $R$ , is given by

$I_R[Q_1 \| Q_0] = \alpha \int dZ\, R(Z) \log \frac{Q_1(Z)}{Q_0(Z)}$

with canonical choices (e.g., $R = Q_1$ recovers KL divergence).

Information as Expected Belief Change: Entropy is the expected information gain upon realization. The framework encompasses classical measures (Shannon entropy, cross-entropy, mutual information) as special cases of expected log likelihood ratios under rational updating.
Implications for Learning: In Bayesian frameworks, the average log-loss decomposes into irreducible error plus an "excess loss" term that equals the per-timestep information gain about latent parameters. This leads to mutual information–based error bounds for learning, meta-learning, and misspecified models, with rate–distortion theory quantifying the limits of compressing or encoding parameters for given task accuracy (Jeon et al., 17 Jul 2024).

5. Resource, Operational, and Thermodynamic Approaches

Foundational work also integrates information theory with physics and computation:

Resource Theories in Thermodynamics: Abstract frameworks (e.g., general probabilistic theories) define information flow and entropy for arbitrary physical theories, extending concepts such as diagonalization and majorization beyond quantum mechanics (Scandolo, 2019).
Constructor Theory and Work Extraction: The ability to extract work is shown to require distinguishability—i.e., the information-theoretic capacity to differentiate system states—in a manner that generalizes the second law and connects it to information-theoretic tasks (Marletto, 2020).
Connections to Computation: The requirements for work extraction (e.g., von Neumann’s universal constructor) are rooted in the information content and the ability to perform distinguishing, copying, and permutation tasks.

6. Cross-Disciplinary Implications and Unification

The information-theoretic foundations yield a suite of unifying insights:

Unified Definition: Information is best viewed as the capacity to effect transformations—whether in states of knowledge, physical substrates, or abstract infological systems. Both axiomatic and operational approaches express this via structural, probabilistic, or logical relationships.
Measurement and Model Selection: Information theory underlies principled inference, maximum entropy methods, and nonparametric estimation, allowing model quality and goodness-of-fit to be assessed via absolute information criteria (e.g., KL divergence from optimum information estimators) (Toda, 2011).
Applications and Limitations: Foundational principles constrain practical schemes for storage (e.g., DNA data storage’s indexing overhead (Shomorony et al., 2022)), coding, and generalization (e.g., mutual information bounds in quantum learning (Caro et al., 2023)). They also highlight phenomena such as the tradeoff between estimation error and misspecification, the limits of code performance, and the scalability of neural models.
Logical and Rate-Distortion Dualities: Partition-based and rate–distortion perspectives extend foundational analysis to tasks ranging from cryptography to meta-learning, providing tight theoretical limits for information compression and transmission.

7. Conclusion

A rigorous, encompassing definition of information theory’s foundations emerges from the synthesis of axiomatic principles, logical partition theory, divergence and mutual information, Bayesian belief updates, and resource-theoretic and operational considerations. The interplay between transformation, distinguishability, and uncertainty not only clarifies core concepts but also governs the mathematical mechanisms underlying learning, inference, communication, and physical processes. As contemporary research expands into quantum, biological, and engineered systems, these foundations provide both the theoretical limits and the operational tools necessary for an integrated understanding of information and its role across scientific disciplines.