O-Information in Complex Systems

Updated 25 March 2026

O-information is a multivariate measure that quantifies the net balance between redundancy (overlapping information) and synergy (emergent patterns) in systems with three or more variables.
It is mathematically defined as the difference between total correlation and dual total correlation, allowing for efficient scaling and clear detection of higher-order dependencies.
Applications span neuroscience, network physiology, and machine learning, with extensions like local, dynamic, and structured O-information enhancing targeted analysis.

O-information is a multivariate information-theoretic functional that quantifies the net balance between redundancy and synergy in systems of three or more random variables. Unlike classical measures such as mutual information or total correlation that are sensitive chiefly to lower-order (pairwise or all-to-all) dependencies, O-information rigorously characterizes whether a system’s higher-order dependencies are dominated by overlapping (redundant) information or by emergent, jointly-encoded (synergistic) structure. The O-information is symmetric, scales efficiently with system size, and applies to both static and dynamical data. It is now a central tool for the analysis of high-order interactions across neuroscience, network physiology, complex systems, and machine learning.

1. Formal Definition and Mathematical Foundations

Let $X^n = (X_1, \dots, X_n)$ be an $n$ -dimensional discrete or continuous random vector, where each $X_j$ is defined on a finite or continuous alphabet. The core formulation of O-information is as the difference of two classical multivariate information measures:

Total correlation (TC) (a.k.a. multi-information, $C$ ):

$C(X^n) \equiv \sum_{j=1}^n H(X_j) - H(X^n)$

Dual total correlation (DTC) (a.k.a. binding entropy, $B$ ):

$B(X^n) \equiv H(X^n) - \sum_{j=1}^n H(X_j \mid X_{-j}^n)$

where $X_{-j}^n$ denotes the vector with $X_j$ removed.

O-information (Rosas et al. (Rosas et al., 2019)):

$\boxed{ \Omega(X^n) \equiv C(X^n) - B(X^n) \ = (n-2)\,H(X^n) + \sum_{j=1}^n [\,H(X_j) - H(X_{-j}^n)\,] }$

Alternative formulations and key algebraic decompositions include:

For $n = 3$ , $\Omega(X_1, X_2, X_3)$ reduces to interaction information:

$\Omega(X_1, X_2, X_3) = I(X_1; X_2; X_3) = I(X_1; X_2) - I(X_1; X_2 \mid X_3)$

O-information can be written as a sum over three-body interaction informations for $n$ variables:

$\Omega(X^n) = \sum_{k=2}^{n-1} I(X_k; X^{k-1}; X_{k+1}^n)$

for any permutation of indices.

These forms highlight that O-information measures departures from pure pairwise-decomposable structure—vanishing for systems composed solely of independent pairs or trees, and amplifying in the presence of nontrivial multipartite dependencies (Rosas et al., 2019, Varley, 12 Jan 2026).

2. Operational Interpretation: Redundancy vs. Synergy

O-information provides a signed, scalar diagnostic of a system’s organization:

$\Omega(X^n) > 0$ : Redundancy-dominated. Several variables encode overlapping information, and collective constraints (i.e., the same “bits” appearing repeatedly) dominate. This occurs in systems with strong copy-like or degenerate structure (e.g., all variables are identical).
$\Omega(X^n) < 0$ : Synergy-dominated. Information is encoded only at the global, joint level; high-order, emergent patterns are present that are invisible in any subset. Archetypal examples include the $n$ -bit XOR, where global structure is maximal but no pairwise marginals are informative.
$\Omega(X^n) = 0$ : Balance, or exclusively pairwise dependencies (e.g., Gaussian Markov trees); redundancy and synergy exactly cancel (Rosas et al., 2019, Varley, 12 Jan 2026).

Sign and magnitude reflect the “order” of dominant interdependencies, as formalized via the $\Delta^k$ measures (Varley, 12 Jan 2026). O-information ( $O(X) = -\Delta^2(X)$ ) discriminates whether interactions of order strictly above or below two predominate.

3. Key Analytical Properties and Extensions

O-information exhibits several fundamental properties (Rosas et al., 2019, Varley, 12 Jan 2026):

Property	Description
Symmetry	Invariant under permutations of variables
Pairwise triviality	$\Omega(X_1, X_2) = 0$ for any joint distribution
Additivity	For independent subsystems, $\Omega(X,Y) = \Omega(X) + \Omega(Y)$
Extremal values	$(2-n)\log m \leq \Omega(X^n) \leq (n-2)\log m$ for variables of $m$ states; bounds are tight
Zero for pairwise-only systems	Holds for all tree-structured dependencies or independent pairs
Scaling and decomposability	Decomposes as sum of three-body interaction informations, efficient for large $n$

Extensions include:

Local and gradient O-information: Pointwise or variable/pair-specific contributions, such as local O-information $\omega(x^n)$ (Scagliarini et al., 2021), and discrete gradients ${}_i\Omega(X^n) = \Omega(X^n) - \Omega(X^n_{-i})$ that localize redundancy/synergy structure (Scagliarini et al., 2022).
Structured O-information: Integration over variable groups, focusing on between-group redundancy or synergy while ignoring within-group effects; crucial for modular networks (Pascual-Marqui et al., 11 Jul 2025).
Dynamic/dynamical O-information: Generalization to multivariate time series using entropy rates and transfer entropy expansions (Mijatovic et al., 2024, Stramaglia et al., 2020).
Hierarchy via $\Delta^k$ : The O-information is $-\Delta^2(X)$ in the $\Delta^k$ spectrum; this connects quantitative redundancy/synergy detection at any order (Varley, 12 Jan 2026).

4. Estimation Methods and Practical Algorithms

Calculating O-information for real data requires estimation of joint entropies and/or total correlations on the observed multivariate distribution. The core algorithmic steps are:

Estimate $H(X^n)$ and all needed marginals $H(X_j), H(X_{-j}), H(X_j\mid X_{-j})$ .
Compute TC and DTC (or directly plug into the $(n-2)H(X^n) + \sum_j[H(X_j) - H(X_{-j})]$ formula).
For "leave-one-out" versions or gradients, work with $\Omega(X^n_{-i})$ as needed.

Estimation approaches:

Discrete (frequency count, plug-in), appropriate for moderate $n$ and finite alphabets (Scagliarini et al., 2021).
K-nearest-neighbor, kernel, or parametric estimators for continuous data (Pomarico et al., 31 Jul 2025).
Gaussian assumption: closed-form in terms of log-determinants of covariance and precision matrices (Pascual-Marqui et al., 11 Jul 2025).
Neural score-based estimation (S $\Omega$ I) enables application to high-dimensional continuous data without explicit density modeling (Bounoua et al., 2024).

O-information is computationally tractable for moderate $n$ (linear in $n$ provided entropy and conditional entropy estimations are feasible), contrasting favorably with the exponential scaling of full partial information decomposition (PID) (Bounoua et al., 2024).

5. Relationships to Other High-Order Information Measures

O-information is situated at the intersection of several classical and modern multivariate information metrics (Rosas et al., 2019, Varley, 12 Jan 2026):

Total correlation (TC, multi-information): Quantifies overall constraint and redundancy; always nonnegative.
Dual total correlation (DTC): Associated with synergy; largest when only joint configurations contain nontrivial information.
Tononi–Sporns–Edelman (TSE) complexity: Sensitive to dependency strength but not to redundancy/synergy balance—TSE conflates the two, while O-information distinguishes them.
Partial information decomposition (PID): Decomposes mutual information into redundant, unique, and synergistic atoms; O-information is fully symmetric and target-agnostic, providing a faster, coarser redundancy-synergy index that scales to high $n$ .
$\Delta^k$ and $\Gamma^k$ families: O-information ( $-\Delta^2(X)$ ) sits within a hierarchy of whole-minus-parts statistics that distinguish progressively higher-order dependency structures (Varley, 12 Jan 2026).

6. Example Applications and Empirical Results

Neuroscience and Physiology: O-information has been applied to fMRI, EEG, and large-scale spike train datasets to reveal redundant and synergistic brain subsystems, particularly in studies of conscious perception and neural integration (Bounoua et al., 2024, Stramaglia et al., 2020, Scagliarini et al., 2021, Mijatovic et al., 2024).

Music Analysis: Comparative studies of Bach's four-voice chorales and Corelli's string trios demonstrated that Bach’s music is synergy-dominated (negative O-information), indicating high-order constraints not captured by pairwise analysis, while Corelli’s texture is redundancy-dominated (positive O-information) due to explicit voice doubling (Rosas et al., 2019, Scagliarini et al., 2021).

Machine Learning and Quantum-Inspired Architectures: O-information tracked the emergence of generalization (grokking) in tensor-network classifiers, coinciding with transitions in entanglement entropy and test accuracy (Pomarico et al., 31 Jul 2025).

Multivariate Time Series and Networks: The dynamic (rate) formulation enables quantification of high-order interactions in physiological and synthetic networks, distinguishing reconfigurations in redundancy and synergy under experimental perturbations (Mijatovic et al., 2024).

Complex Systems and Statistical Mechanics: In spin systems, O-information signals the presence of higher-order couplings, transitioning from near zero for pairwise-only interactions to large negative values as $k$ -body interactions grow (Rosas et al., 2019).

7. Limitations, Current Generalizations, and Future Research

Limitations:

O-information is a global summary and does not yield a full partial information decomposition (PID); it cannot isolate unique contributions or mixed modes beyond redundancy and synergy (Rosas et al., 2019, Varley, 12 Jan 2026).
Nonparametric estimation is challenged by the curse of dimensionality for high $n$ and continuous variables. Score-based and Gaussian methods mitigate but do not eliminate such issues (Bounoua et al., 2024, Pascual-Marqui et al., 11 Jul 2025).
For structured systems with strong within-group redundancy and between-group synergy, classical O-information may underestimate between-group synergy; structured O-information generalizations address this (Pascual-Marqui et al., 11 Jul 2025).

Current Developments:

Score-based neural estimators enable O-information evaluation for higher-dimensional, non-discretized, and continuous spaces (Bounoua et al., 2024).
Local and gradient O-information provide fine-grained diagnosis of high-order structure at the pattern and variable level (Scagliarini et al., 2022, Scagliarini et al., 2021).
Structured and dynamic extensions permit dissection of multiscale, modular, and time-evolving systems (Pascual-Marqui et al., 11 Jul 2025, Mijatovic et al., 2024, Stramaglia et al., 2020).

Future Directions:

Integration of O-information with causal, dynamical, and PID-style decompositions.
Scalability to larger $n$ via network sparsity, efficient bias-corrected estimators, or exploitation of system-specific structure.
Application to ecological, genetic, and highly modular data with group‐wise formulations.

In summary, O-information provides a principled, scalable, and interpretable scalar diagnostic for high-order information structure in multivariate systems, facilitating quantitative discrimination between redundancy- and synergy-dominated regimes, localizing high-order informational circuits, and advancing the theoretical and applied analysis of complex networks (Rosas et al., 2019, Varley, 12 Jan 2026, Bounoua et al., 2024, Scagliarini et al., 2022, Pascual-Marqui et al., 11 Jul 2025, Scagliarini et al., 2021, Mijatovic et al., 2024, Pomarico et al., 31 Jul 2025, Stramaglia et al., 2020).