Papers
Topics
Authors
Recent
2000 character limit reached

Neural Information Squeezer (NIS)

Updated 7 January 2026
  • Neural Information Squeezer (NIS) is a machine learning framework that identifies causal emergence via learned coarse-graining of high-dimensional Markovian systems.
  • It employs a three-module neural architecture—an encoder with invertible transformations, a macro-dynamics learner, and a decoder—to balance information retention with accurate microstate reconstruction.
  • Empirical results in systems like oscillators, Markov chains, and Boolean networks validate NIS's ability to expose multiscale causal structures by maximizing effective information.

Neural Information Squeezer (NIS) is a general machine learning framework for identifying causal emergence through learned coarse-graining of Markovian dynamical systems. The framework employs neural network parameterizations to discover optimal coarse-graining strategies and low-dimensional macro-state dynamics directly from time-series data, maximizing effective information (EI) at the macro level subject to accurate reconstruction of the microdynamics (Zhang et al., 2022). Its architecture explicitly separates information-preserving transformations from information-dropping projections, enabling rigorous analysis of information retention and causal structure across scales.

1. Framework Objective and Problem Setting

NIS addresses the detection and quantification of causal emergence—the phenomenon where a suitable coarse-grained representation of a Markovian system exhibits stronger causal connections than the microscopic description. Given a dynamical system with microstates XtRpX_t \in \mathbb{R}^p and a transition law P(xt+1xt)P(x_{t+1} | x_t), the framework seeks: (a) a differentiable coarse-graining map φ:RpRq\varphi : \mathbb{R}^p \to \mathbb{R}^q, (b) a Markovian macro-dynamics f:RqRqf : \mathbb{R}^q \to \mathbb{R}^q, and (c) a decoder φ\varphi^\dagger that reconstructs microstates from macro ones. The aim is to maximize the effective information in the macro dynamics while maintaining high-fidelity prediction of Xt+1X_{t+1}.

2. Architecture and Information-Preserving Mapping

The NIS architecture comprises three core neural modules:

  1. Encoder (φn\varphi_n): Implements a two-stage transformation φn=χψ\varphi_n = \chi \circ \psi where
    • ψα:RpRp\psi_\alpha: \mathbb{R}^p \to \mathbb{R}^p is an invertible bijection parameterized by stacked RealNVP coupling layers.
    • χpq:RpRq\chi_{p \to q}: \mathbb{R}^p \to \mathbb{R}^q projects onto the first qq coordinates, dropping pqp-q dimensions.
  2. Macro-dynamics Learner (fβf_\beta): Models drift in the macro space as

Zt+1=Zt+fβ(Zt)+ξ,ξN(0,Σ)Z_{t+1} = Z_t + f_\beta(Z_t) + \xi', \quad \xi' \sim \mathcal{N}(0, \Sigma)

so that P(Zt+1Zt)P(Z_{t+1} | Z_t) is Gaussian.

  1. Decoder (φ\varphi^\dagger): Inverts ψα\psi_\alpha, reconstructing microstates as

φ(z)=ψα1(zζ),ζN(0,Ipq)\varphi^\dagger(z) = \psi_\alpha^{-1}(z \oplus \zeta), \quad \zeta \sim \mathcal{N}(0, I_{p-q})

where zζz \oplus \zeta denotes the concatenation of zz and Gaussian noise filling the dropped dimensions.

By explicitly separating information conversion (via bijective ψ\psi) from information dropping (via projection χ\chi), NIS enables precise control over the retained information channel width qq and analytically tracks the information loss.

3. Effective Information and Causal Emergence Calculation

Effective information (EI) of a stochastic map F:ZZF: Z \to Z' in NIS is defined as the mutual information between a uniform intervention on ZZ and the resulting ZZ': EI(F)=I[Do(ZU(L,L)q);Z]EI(F) = I[\text{Do}(Z \sim U(-L, L)^q);\, Z'] For macro-dynamics F(Z)N(Z+fβ(Z),Σ)F(Z) \approx \mathcal{N}(Z+f_\beta(Z), \Sigma), an analytical approximation holds: EI(F)12[q+qln(2π)+i=1qlnσi2]+qln(2L)+EZUlndetfβ(Z)/ZEI(F) \approx -\frac{1}{2} \left[q + q\ln(2\pi) + \sum_{i=1}^q \ln \sigma_i^2\right] + q \ln(2L) + \mathbb{E}_{Z \sim U} \ln|\det \partial f_\beta(Z)/\partial Z| Because qln(2L)q\ln(2L) diverges, the dimension-averaged effective information dEI(F)=EI(F)/qdEI(F)=EI(F)/q and the dimension-averaged causal emergence dCE=dEI(macro)dEI(micro)dCE=dEI(\text{macro})-dEI(\text{micro}) provide meaningful comparative metrics.

4. Training Objectives and Information-Bottleneck Regime

NIS training proceeds in two stages:

  • Stage 1 (Reconstruction): For a fixed qq, optimize the conditional log-likelihood of observed transitions by maximizing

L1(α,β)=tlnP(Xt+1=Y^t+1Xt)tXt+1Y^t+1l\mathcal{L}_1(\alpha, \beta) = \sum_t \ln P(X_{t+1} = \hat{Y}_{t+1} | X_t) \approx -\sum_t \|X_{t+1} - \hat{Y}_{t+1}\|_l

where l=1l=1 or $2$, corresponding to Laplace or Gaussian noise respectively. This fits the encoder and macro-dynamics to ensure effective prediction.

  • Stage 2 (Scale Search): Sweep qq over 1,,p11,\ldots,p-1, retrain, and evaluate dCEdCE; select qq maximizing emergent EI.

No explicit EI regularization term is required; the architecture's bottleneck structure naturally creates a trade-off between information retention and channel width. Once trained, mutual information I(Xt;Y^t+1)I(X_t; \hat{Y}_{t+1}) and I(Zt;Zt+1)I(Z_t; Z_{t+1}) converge to I(Xt;Xt+1)I(X_t; X_{t+1}) for any qq; the mapping dynamically allocates useful information per coordinate by scaling detψ|\det \partial \psi|.

5. Empirical Demonstrations of Causal Emergence

NIS successfully identifies causal emergence in diverse systems:

System Microstates & Encoding Optimal qq Coarse-Graining Effect Causal Emergence
Spring oscillator X=(z,v)X=(z,v), R2\mathbb{R}^2 2 Recovers (z,v)(z,v); fβf_\beta matches physics dCE(q)dCE(q) peaks at 2
8-state Markov chain 1-hot in R8\mathbb{R}^8 1 Groups {1..7}0\{1..7\}\to0, 818\to1 Macro EI \gg micro EI
Boolean network 4 bits, 16 states 1 Clusters 16 microstates into 4 macro groups Matches [Hoel et al.]

In each case, NIS recovers known optimal coarse-grainings and demonstrates positive causal emergence (dCE>0dCE>0) (Zhang et al., 2022).

6. Limitations and Assumptions

Several practical and theoretical limitations arise:

  • RealNVP-based invertible networks for ψ\psi are challenging to scale to high-dimensional microstate spaces; stability during training is a concern.
  • Macro transition noise is assumed to be Gaussian (or Laplace); more flexible likelihood models, e.g., normalizing flows on Zt+1Z_{t+1}, are not yet implemented.
  • The coarse-graining map φ\varphi is a black-box invertible/projection composite; improving interpretability by imposing sparsity or explicit variable grouping is an open direction.
  • Extensions to continuous-time dynamics (stochastic differential equations) or mappings from trajectories to macro-trajectories are not yet implemented.
  • There is no general closed-form criterion for when causal emergence dCE>0dCE>0 is guaranteed from microdynamics alone; current methodology relies on empirical dEIdEI computation.

7. Extensions and Future Perspectives

Future work may address scaling the invertible map to higher dimensionalities, incorporating richer probabilistic macro-dynamics, and enforcing interpretable structure in the coarse-grainer. A plausible implication is that, with such enhancements, NIS could systematically uncover multiscale causal structures in complex nonlinear systems from data alone. Establishing theoretical guarantees for causal emergence under broader classes of micro-dynamics remains an open research question (Zhang et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Neural Information Squeezer (NIS).