Unique Information Decomposition
- Unique information decomposition is a framework that partitions mutual information into unique, redundant, and synergistic components to reveal how data from multiple predictors contributes to a target.
- It employs a formal bivariate method, using constrained optimization to minimize conditional mutual information and quantify specific decision-theoretic advantages.
- The framework supports efficient algorithmic computation and highlights challenges in extending to multivariate cases, impacting fields like neuroscience, communication theory, and machine learning.
Unique information decomposition is a framework for dissecting the mutual information between a set of predictor (source) variables and a target variable into constituent components—specifically, into unique, shared (redundant), and synergistic (complementary) parts. This structural disaggregation addresses fundamental questions about how information is distributed, lost, or synergistically generated among multiple variables, with profound implications for neuroscience, cryptography, machine learning, communication theory, and the foundations of information theory. Theoretical and algorithmic advances have been largely centered on the bivariate case (two predictors) due to deep challenges in extending to higher-order decompositions.
1. Formal Framework: Bivariate Decomposition
Consider three discrete random variables , , and with joint law . The unique information of about with respect to , denoted , is defined as the minimum conditional mutual information over the set of joint distributions that preserve the observed 0 and 1 pairwise marginals: 2 where
3
Analogously, shared (redundant) and complementary (synergistic) information are defined by: 4 where 5 denotes coinformation, and 6 denotes mutual information.
The decomposition holds: 7 with the corresponding consistency equations for each marginal.
This framework, due to Bertschinger et al. (Bertschinger et al., 2013, Rauh et al., 2014), is both operationally and structurally motivated.
2. Key Properties and Interpretation
The unique, shared, and synergistic information quantities exhibit several essential properties (Rauh et al., 2014):
- Nonnegativity: All four quantities (8, 9, 0) are nonnegative.
- Marginal-invariance: Unique and shared information depend only on the observed pairwise marginals 1 and 2.
- Monotonicity: Unique information is monotonic in each argument—enlarging 3 (adding predictors) cannot increase the unique information of 4, while enlarging 5 or 6 cannot decrease it.
- Special cases:
- For 7 (XOR of independent bits): 8, 9 bit (pure synergy).
- For 0 determines 1 and 2 is independent: all information unique to 3, with 4.
Operationally, the unique information quantifies the advantage held in a decision problem where an agent has access to 5 in place of 6, or vice versa, formalized via the decision-theoretic Blackwell order (Bertschinger et al., 2013).
3. Algorithmic Computation and Numerical Methods
The unique information involves a constrained optimization (minimization of 7) over the polytope 8, which is typically high-dimensional (Banerjee et al., 2017). Convex optimization methods—including alternating divergence minimization and generalized iterative scaling—have been developed for efficient, convergent computation: 4 This method guarantees global convergence and is substantially faster than unconstrained solvers for system sizes up to 9.
For continuous or parametric distributions (e.g., Gaussian, Poisson, multinomial), analytic solutions or adaptations exist when the structure of the marginal polytope allows one unique information term to vanish, massively simplifying the decomposition (Goswami et al., 2023).
4. Theoretical Limitations and Multivariate Decomposition
A central result shows that any attempt to extend the bivariate SI/UI/CI decomposition to more than two predictors via the Williams–Beer partial information lattice leads to an incompatibility: no redundancy function 0 can simultaneously satisfy nonnegativity, monotonicity, symmetry, and the identity axiom for 1 (Rauh et al., 2014).
Theorem (Rauh et al.): For 2, any redundancy function 3 designed to generalize bivariate SI/UI/CI with nonnegative atomic decomposition must violate at least one natural axiom.
This precludes a nonnegative information decomposition situated on the PI-lattice for 4. The precise combinatorial structure enabling a consistent multivariate unique information decomposition remains an open problem.
Nevertheless, pairwise ("partitioned") unique information can still be meaningfully defined, and nonnegative inequalities such as
5
are conjectured to hold, offering a possible route to reconstructing atomic structure in larger systems (Rauh et al., 2014).
5. Operational and Decision-Theoretic Foundations
The decision-theoretic underpinning of unique information is rooted in the Blackwell order: 6 carries no unique information about 7 relative to 8 if and only if 9 is a garbling (randomized copy) of 0 in the sense of statistical decision theory (Bertschinger et al., 2013, Rauh et al., 2014). More specifically, unique information vanishes precisely when no decision-maker observing 1 can improve upon optimal choices based solely on 2 for any loss function.
The structure of the optimization set 3 ensures that 4, 5, and 6 depend only on the two relevant decision problems defined by the pairwise marginals, and that synergy is forced to zero for some 7, capturing the channel degradation structure (Rauh et al., 2014).
6. Examples and Interpretation of Decomposition
The following table summarizes several archetypal examples:
| Distribution | 8 | 9 (both) | 0 | Key Feature |
|---|---|---|---|---|
| 1 (COPY) | 2 | 3 | 4 | All redundancy/unique |
| 5 (XOR) | 6 | 7 | 8 | Pure synergy |
| 9 (AND) | 0 | 1 | 2 | Partial redundancy/synergy |
These confirm the interpretive design: UI captures only information exclusive to a given predictor, SI the redundancy, and CI the higher-order synergy.
7. Practical Considerations and Open Directions
Practically, the unique information framework is robust in the bivariate setting, with efficient algorithms and a clear operational meaning. However, current limitations include:
- Non-uniqueness of solutions: For certain marginal cardinalities or conditional independence cases, the optimizer for 3 is not unique, resulting in ambiguity in the attribution of unique information (Rauh et al., 2019).
- Incomplete multivariate extensions: There is no general nonnegative decomposition into unique, redundant, and synergistic parts for more than two predictors based on the PI-lattice. Structuring the appropriate combinatorial or geometric foundation for the multivariate case remains unresolved (Rauh et al., 2014).
- Application to practice: Despite these theoretical obstacles, bivariate unique information can be computed for arbitrary partitions in multivariate systems, providing a partial picture of information flow in complex systems.
Current research seeks to overcome these barriers via richer combinatorial structures, and to establish new nonnegative atomic decompositions compatible with natural axioms for systems with three or more predictors (Rauh et al., 2014).
References:
- "Quantifying unique information" (Bertschinger et al., 2013)
- "Reconsidering unique information: Towards a multivariate information decomposition" (Rauh et al., 2014)
- "Computing the Unique Information" (Banerjee et al., 2017)
- "Properties of Unique Information" (Rauh et al., 2019)