Partial Information Decomposition

Updated 11 April 2026

Partial Information Decomposition is an advanced information theory framework that partitions mutual information into redundancy, unique, and synergistic contributions.
It builds on the Williams–Beer redundancy lattice with measures like I_min, BROJA/PID, and order-based PIDs to assign distinct informational atoms.
PID has been extended to multivariate, continuous, time-series, and quantum settings, informing applications in neuroscience, machine learning, and multimodal fusion.

Partial Information Decomposition (PID) is an advanced information-theoretic framework for resolving the mutual information that multiple source variables collectively have about a target variable into atoms corresponding to redundant, unique, and synergistic informational modes. PID generalizes Shannon theory by transcending bivariate measures, aiming to quantify and interpret complex multipath, higher-order informational relationships in multivariate systems (Liardi et al., 3 Mar 2026, Kolchinsky, 2019).

1. Foundational Concepts and the Williams–Beer Lattice

Given sources $X_1, \ldots, X_n$ and a target $T$ , the mutual information $I(X_1, \ldots, X_n; T)$ is decomposed as

$I(X_1, \ldots, X_n; T) = \sum_{\alpha \in \mathcal{A}} I_\partial(\alpha; T)$

where $\mathcal{A}$ is the set of all antichains of subsets of the sources (the redundancy lattice). For $n=2$ , the four standard PID atoms are:

Redundancy: Information about $T$ present in both $X_1$ and $X_2$
Unique information: Information about $T$ present only in $T$ 0 or only in $T$ 1
Synergy: Information about $T$ 2 present only in the joint observation of $T$ 3 and $T$ 4

The essential constraints for the atoms in the bivariate case are: $T$ 5 The Möbius inversion on the redundancy lattice provides a uniquely specified assignment of PID atoms once suitable functionals for redundancy are given (Gutknecht et al., 2020, Liardi et al., 3 Mar 2026).

2. Formal Redundancy Measures and Operational Interpretations

Several concrete redundancy measures instantiate the abstract PID architecture:

Williams–Beer minimum information ( $T$ 6): The average minimal local information per outcome, i.e., $T$ 7 (Liardi et al., 3 Mar 2026, Gutknecht et al., 2020).
BROJA/PID based on marginal constraints (Bertschinger et al., 2014): Redundancy and uniqueness are defined via constrained optimizations fixing source-target marginals. This can be stated as:

$T$ 8

where $T$ 9 is the set of joint distributions with fixed source-target marginals (Zhao et al., 6 Oct 2025, Wibral et al., 2015).

Order-based PIDs (Kolchinsky): By abstracting “informational subset” relations as preorders (e.g., Blackwell, less-noisy, more-capable orders between channels), one defines redundancy as the maximal information common to all sources under the relevant ordering (Kolchinsky, 2019, Gomes et al., 2023):

$I(X_1, \ldots, X_n; T)$ 0

This approach provides a decision-theoretic, operational interpretation and extends to arbitrary numbers of sources (Gomes et al., 2023).

Operationally, Blackwell-based PIDs connect unique information with optimal performance in arbitrary statistical decision problems; redundancy quantifies the maximum information extractable from all sources without preference, and synergy captures the excess gained only by using sources jointly (Kolchinsky, 2019).

3. Extensions: Multivariate, Continuous, Mixed, and Time-Series PID

PID has been generalized beyond discrete bivariate settings:

n-source PID and antichain lattices: For $I(X_1, \ldots, X_n; T)$ 1 sources, the exponential growth of the redundancy lattice (Dedekind number) creates severe definitional and computational challenges, and several proposed PIDs are inconsistent or underdetermined for n > 2 (Matthias et al., 18 Dec 2025, Pica et al., 2017, Liardi et al., 3 Mar 2026). Notably, no lattice-based decomposition yields a globally nonnegative, chain-rule-consistent PID for all subsets beyond n = 2 (Matthias et al., 18 Dec 2025).
Mixed discrete–continuous and continuous-variable PID: Measure-theoretic PIDs based on shared-exclusion functionals accommodate arbitrary combinations of variable types, using local Radon–Nikodym derivatives over conditioning events (Schick-Poland et al., 2021, Barà et al., 2024).
Gaussian PID (GPID) and normalizing flows: For jointly Gaussian sources/targets, the GPID leverages closed-form optimization on joint covariance matrices, with generalizations to arbitrary continuous distributions via invertible, information-preserving normalizing flows (Flow-PID) (Zhao et al., 6 Oct 2025, Venkatesh et al., 2021). Deficiency-based approaches yield efficient high-dimensional algorithms while preserving major PID axioms (Venkatesh et al., 2021).
Rate-based PID for stochastic processes: The Partial Information Rate Decomposition (PIRD) extends PID from i.i.d. settings to stationary time series, decomposing mutual information rates over spectral representations—key for capturing dynamical high-order effects in networked systems (Faes et al., 6 Feb 2025).

4. Axiomatic Analysis and Inconsistency Results

The development of PID measures is tightly constrained by axiomatic desiderata derived from information theory and logic:

Axiom/Property	Informal Meaning
Self-redundancy (SR)	Redundancy of a single source equals its mutual information
Symmetry (S)	Redundancy is invariant under permutation of sources
Monotonicity (M)	Adding sources cannot increase redundancy
Subset/Deterministic Equality	Adding a deterministic copy does not change redundancy
Local/Global Positivity (LP/GP)	Atoms are nonnegative
Identity Property (ID)	Redundancy in joint source-target equals source mutual info
Target Chain Rule (TCR)	Redundancy satisfies the same chain rule as Shannon MI
Invariance (REI)	PID is unchanged under bijective re-parameterization of variables

Key impossibility theorems demonstrate fundamental tradeoffs: No PID can simultaneously satisfy local positivity, target chain rule, and invariance under invertible transformations for all n. For n ≥ 3, any lattice-based PID must sacrifice at least one major axiom (e.g., nonnegativity, chain rule, or invariance) (Matthias et al., 18 Dec 2025, Liardi et al., 3 Mar 2026). Even classical properties like the identity axiom, when combined with nonnegativity, exclude some prominent PIDs.

5. Methodological Innovations: Parthood, Logic, and Alternative Decompositions

Recent work re-derives PID via fundamental part–whole (mereological) relations rather than redundancy functionals, yielding a fine-grained view of information atoms indexed by monotonic Boolean functions (Dedekind lattice). This approach is isomorphic to both the redundancy lattice and a logic-based lattice where each atom corresponds to a logical statement about source realizations. These “three worlds” (parthood, antichains, logic) permit generalizations and systematic construction of alternative PIDs beyond redundancy-focused schemes (Gutknecht et al., 2020).

Additionally, this perspective allows alternative decompositions:

Unique and strong/moderate–synergy based PIDs, built by selecting different parthood-criteria for atom assignment, rather than enforcing redundancy as the primary partitioning.
Fourier analysis for Boolean gates clarifies that, for uniform inputs, PID atoms correspond directly to squared Fourier coefficients—first-order terms encode unique, higher-order encode synergy, bias encodes redundancy (Makkeh et al., 2020).

6. Applications and Empirical Findings

PID has found diverse empirical usage:

Neuroscience: Differential decoding of synergistic vs. unique coding in populations, e.g., “coding with synergy” and the specification of neural goal functions (Wibral et al., 2015). PID quantifies and distinguishes redundancy in input integration, unique influences, and emergent synergy.
Machine Learning/Feature Selection: PID-based metrics (e.g., PID of Features) provide nuanced interpretability in model selection, by distinguishing unique relevance, redundancy, and synergistic effects across features (Barà et al., 2024).
Text and summarization: PID reveals that multi-document summaries predominantly draw from union and unique information as the number of sources grows, with synergy being negligible in practice (Mascarell et al., 2024).
Multimodal fusion and model benchmarking: Flow-PID quantifies fusion mechanisms across modalities, guiding model choice and highlighting genuinely synergistic interactions (Zhao et al., 6 Oct 2025).

Empirically, synergy is often rare in human-generated artifacts, while redundancy decreases and unique information increases with system complexity. Synthetic tasks engineered for synergy (e.g., XOR-like Boolean ensembles) validate the ability of advanced estimators to detect high-order information.

7. Quantum PID and Extensions Beyond Shannon Theory

Partial Information Decomposition has been generalized to quantum information settings, using von Neumann entropy and quantum conditional operators to define unique, redundant, and synergistic information while maintaining invariance under local unitaries and extending the operational meaning of redundancy to the quantum field. Quantum PID distinguishes itself from classical tri-information by achieving stricter refinement of correlations, especially in quantum scrambling and information-monogamy contexts (Enk, 2023). Further, the abstract algebraic-axiomatic PID framework applies in non-Shannonian domains such as algorithmic information and quantum channels (Kolchinsky, 2019).

In summary, Partial Information Decomposition is a flexible and deeply-axiomatized generalization of mutual information, enabling rigorous dissection of multivariate dependency structure. Despite the proliferation of proposals and a landscape marked by fundamental incompatibilities among desirable properties, recent advances provide both efficient algorithms and a unifying logic-theoretic underpinning, informing empirical practice across the sciences and engineering (Liardi et al., 3 Mar 2026, Matthias et al., 18 Dec 2025, Kolchinsky, 2019).