Neural Information Squeezer (NIS)

Updated 7 January 2026

Neural Information Squeezer (NIS) is a machine learning framework that identifies causal emergence via learned coarse-graining of high-dimensional Markovian systems.
It employs a three-module neural architecture—an encoder with invertible transformations, a macro-dynamics learner, and a decoder—to balance information retention with accurate microstate reconstruction.
Empirical results in systems like oscillators, Markov chains, and Boolean networks validate NIS's ability to expose multiscale causal structures by maximizing effective information.

Neural Information Squeezer (NIS) is a general machine learning framework for identifying causal emergence through learned coarse-graining of Markovian dynamical systems. The framework employs neural network parameterizations to discover optimal coarse-graining strategies and low-dimensional macro-state dynamics directly from time-series data, maximizing effective information (EI) at the macro level subject to accurate reconstruction of the microdynamics (Zhang et al., 2022). Its architecture explicitly separates information-preserving transformations from information-dropping projections, enabling rigorous analysis of information retention and causal structure across scales.

1. Framework Objective and Problem Setting

NIS addresses the detection and quantification of causal emergence—the phenomenon where a suitable coarse-grained representation of a Markovian system exhibits stronger causal connections than the microscopic description. Given a dynamical system with microstates $X_t \in \mathbb{R}^p$ and a transition law $P(x_{t+1} | x_t)$ , the framework seeks: (a) a differentiable coarse-graining map $\varphi : \mathbb{R}^p \to \mathbb{R}^q$ , (b) a Markovian macro-dynamics $f : \mathbb{R}^q \to \mathbb{R}^q$ , and (c) a decoder $\varphi^\dagger$ that reconstructs microstates from macro ones. The aim is to maximize the effective information in the macro dynamics while maintaining high-fidelity prediction of $X_{t+1}$ .

2. Architecture and Information-Preserving Mapping

The NIS architecture comprises three core neural modules:

Encoder ( $\varphi_n$ ): Implements a two-stage transformation $\varphi_n = \chi \circ \psi$ $φ_{n} = χ \circ ψ$ where
- $\psi_\alpha: \mathbb{R}^p \to \mathbb{R}^p$ is an invertible bijection parameterized by stacked RealNVP coupling layers.
- $\chi_{p \to q}: \mathbb{R}^p \to \mathbb{R}^q$ projects onto the first $q$ coordinates, dropping $p-q$ dimensions.
Macro-dynamics Learner ( $f_\beta$ ): Models drift in the macro space as

$Z_{t+1} = Z_t + f_\beta(Z_t) + \xi', \quad \xi' \sim \mathcal{N}(0, \Sigma)$

so that $P(Z_{t+1} | Z_t)$ is Gaussian.

Decoder ( $\varphi^\dagger$ ): Inverts $\psi_\alpha$ , reconstructing microstates as

$\varphi^\dagger(z) = \psi_\alpha^{-1}(z \oplus \zeta), \quad \zeta \sim \mathcal{N}(0, I_{p-q})$

where $z \oplus \zeta$ denotes the concatenation of $z$ and Gaussian noise filling the dropped dimensions.

By explicitly separating information conversion (via bijective $\psi$ ) from information dropping (via projection $\chi$ ), NIS enables precise control over the retained information channel width $q$ and analytically tracks the information loss.

3. Effective Information and Causal Emergence Calculation

Effective information (EI) of a stochastic map $F: Z \to Z'$ in NIS is defined as the mutual information between a uniform intervention on $Z$ and the resulting $Z'$ : $EI(F) = I[\text{Do}(Z \sim U(-L, L)^q);\, Z']$ For macro-dynamics $F(Z) \approx \mathcal{N}(Z+f_\beta(Z), \Sigma)$ , an analytical approximation holds: $EI(F) \approx -\frac{1}{2} \left[q + q\ln(2\pi) + \sum_{i=1}^q \ln \sigma_i^2\right] + q \ln(2L) + \mathbb{E}_{Z \sim U} \ln|\det \partial f_\beta(Z)/\partial Z|$ Because $q\ln(2L)$ diverges, the dimension-averaged effective information $dEI(F)=EI(F)/q$ and the dimension-averaged causal emergence $dCE=dEI(\text{macro})-dEI(\text{micro})$ provide meaningful comparative metrics.

4. Training Objectives and Information-Bottleneck Regime

NIS training proceeds in two stages:

Stage 1 (Reconstruction): For a fixed $q$ , optimize the conditional log-likelihood of observed transitions by maximizing

$\mathcal{L}_1(\alpha, \beta) = \sum_t \ln P(X_{t+1} = \hat{Y}_{t+1} | X_t) \approx -\sum_t \|X_{t+1} - \hat{Y}_{t+1}\|_l$

where $l=1$ or $2$, corresponding to Laplace or Gaussian noise respectively. This fits the encoder and macro-dynamics to ensure effective prediction.

Stage 2 (Scale Search): Sweep $q$ over $1,\ldots,p-1$ , retrain, and evaluate $dCE$ ; select $q$ maximizing emergent EI.

No explicit EI regularization term is required; the architecture's bottleneck structure naturally creates a trade-off between information retention and channel width. Once trained, mutual information $I(X_t; \hat{Y}_{t+1})$ and $I(Z_t; Z_{t+1})$ converge to $I(X_t; X_{t+1})$ for any $q$ ; the mapping dynamically allocates useful information per coordinate by scaling $|\det \partial \psi|$ .

5. Empirical Demonstrations of Causal Emergence

NIS successfully identifies causal emergence in diverse systems:

System	Microstates & Encoding	Optimal $q$	Coarse-Graining Effect	Causal Emergence
Spring oscillator	$X=(z,v)$ , $\mathbb{R}^2$	2	Recovers $(z,v)$ ; $f_\beta$ matches physics	$dCE(q)$ peaks at 2
8-state Markov chain	1-hot in $\mathbb{R}^8$	1	Groups $\{1..7\}\to0$ , $8\to1$	Macro EI $\gg$ micro EI
Boolean network	4 bits, 16 states	1	Clusters 16 microstates into 4 macro groups	Matches [Hoel et al.]

In each case, NIS recovers known optimal coarse-grainings and demonstrates positive causal emergence ( $dCE>0$ ) (Zhang et al., 2022).

6. Limitations and Assumptions

Several practical and theoretical limitations arise:

RealNVP-based invertible networks for $\psi$ are challenging to scale to high-dimensional microstate spaces; stability during training is a concern.
Macro transition noise is assumed to be Gaussian (or Laplace); more flexible likelihood models, e.g., normalizing flows on $Z_{t+1}$ , are not yet implemented.
The coarse-graining map $\varphi$ is a black-box invertible/projection composite; improving interpretability by imposing sparsity or explicit variable grouping is an open direction.
Extensions to continuous-time dynamics (stochastic differential equations) or mappings from trajectories to macro-trajectories are not yet implemented.
There is no general closed-form criterion for when causal emergence $dCE>0$ is guaranteed from microdynamics alone; current methodology relies on empirical $dEI$ computation.

7. Extensions and Future Perspectives

Future work may address scaling the invertible map to higher dimensionalities, incorporating richer probabilistic macro-dynamics, and enforcing interpretable structure in the coarse-grainer. A plausible implication is that, with such enhancements, NIS could systematically uncover multiscale causal structures in complex nonlinear systems from data alone. Establishing theoretical guarantees for causal emergence under broader classes of micro-dynamics remains an open research question (Zhang et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Neural Information Squeezer for Causal Emergence (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Neural Information Squeezer (NIS).