Neural Information Squeezer (NIS)
- Neural Information Squeezer (NIS) is a machine learning framework that identifies causal emergence via learned coarse-graining of high-dimensional Markovian systems.
- It employs a three-module neural architecture—an encoder with invertible transformations, a macro-dynamics learner, and a decoder—to balance information retention with accurate microstate reconstruction.
- Empirical results in systems like oscillators, Markov chains, and Boolean networks validate NIS's ability to expose multiscale causal structures by maximizing effective information.
Neural Information Squeezer (NIS) is a general machine learning framework for identifying causal emergence through learned coarse-graining of Markovian dynamical systems. The framework employs neural network parameterizations to discover optimal coarse-graining strategies and low-dimensional macro-state dynamics directly from time-series data, maximizing effective information (EI) at the macro level subject to accurate reconstruction of the microdynamics (Zhang et al., 2022). Its architecture explicitly separates information-preserving transformations from information-dropping projections, enabling rigorous analysis of information retention and causal structure across scales.
1. Framework Objective and Problem Setting
NIS addresses the detection and quantification of causal emergence—the phenomenon where a suitable coarse-grained representation of a Markovian system exhibits stronger causal connections than the microscopic description. Given a dynamical system with microstates and a transition law , the framework seeks: (a) a differentiable coarse-graining map , (b) a Markovian macro-dynamics , and (c) a decoder that reconstructs microstates from macro ones. The aim is to maximize the effective information in the macro dynamics while maintaining high-fidelity prediction of .
2. Architecture and Information-Preserving Mapping
The NIS architecture comprises three core neural modules:
- Encoder (): Implements a two-stage transformation where
- is an invertible bijection parameterized by stacked RealNVP coupling layers.
- projects onto the first coordinates, dropping dimensions.
- Macro-dynamics Learner (): Models drift in the macro space as
so that is Gaussian.
- Decoder (): Inverts , reconstructing microstates as
where denotes the concatenation of and Gaussian noise filling the dropped dimensions.
By explicitly separating information conversion (via bijective ) from information dropping (via projection ), NIS enables precise control over the retained information channel width and analytically tracks the information loss.
3. Effective Information and Causal Emergence Calculation
Effective information (EI) of a stochastic map in NIS is defined as the mutual information between a uniform intervention on and the resulting : For macro-dynamics , an analytical approximation holds: Because diverges, the dimension-averaged effective information and the dimension-averaged causal emergence provide meaningful comparative metrics.
4. Training Objectives and Information-Bottleneck Regime
NIS training proceeds in two stages:
- Stage 1 (Reconstruction): For a fixed , optimize the conditional log-likelihood of observed transitions by maximizing
where or $2$, corresponding to Laplace or Gaussian noise respectively. This fits the encoder and macro-dynamics to ensure effective prediction.
- Stage 2 (Scale Search): Sweep over , retrain, and evaluate ; select maximizing emergent EI.
No explicit EI regularization term is required; the architecture's bottleneck structure naturally creates a trade-off between information retention and channel width. Once trained, mutual information and converge to for any ; the mapping dynamically allocates useful information per coordinate by scaling .
5. Empirical Demonstrations of Causal Emergence
NIS successfully identifies causal emergence in diverse systems:
| System | Microstates & Encoding | Optimal | Coarse-Graining Effect | Causal Emergence |
|---|---|---|---|---|
| Spring oscillator | , | 2 | Recovers ; matches physics | peaks at 2 |
| 8-state Markov chain | 1-hot in | 1 | Groups , | Macro EI micro EI |
| Boolean network | 4 bits, 16 states | 1 | Clusters 16 microstates into 4 macro groups | Matches [Hoel et al.] |
In each case, NIS recovers known optimal coarse-grainings and demonstrates positive causal emergence () (Zhang et al., 2022).
6. Limitations and Assumptions
Several practical and theoretical limitations arise:
- RealNVP-based invertible networks for are challenging to scale to high-dimensional microstate spaces; stability during training is a concern.
- Macro transition noise is assumed to be Gaussian (or Laplace); more flexible likelihood models, e.g., normalizing flows on , are not yet implemented.
- The coarse-graining map is a black-box invertible/projection composite; improving interpretability by imposing sparsity or explicit variable grouping is an open direction.
- Extensions to continuous-time dynamics (stochastic differential equations) or mappings from trajectories to macro-trajectories are not yet implemented.
- There is no general closed-form criterion for when causal emergence is guaranteed from microdynamics alone; current methodology relies on empirical computation.
7. Extensions and Future Perspectives
Future work may address scaling the invertible map to higher dimensionalities, incorporating richer probabilistic macro-dynamics, and enforcing interpretable structure in the coarse-grainer. A plausible implication is that, with such enhancements, NIS could systematically uncover multiscale causal structures in complex nonlinear systems from data alone. Establishing theoretical guarantees for causal emergence under broader classes of micro-dynamics remains an open research question (Zhang et al., 2022).