Hidden-State Propagation: Mechanisms & Models

Updated 26 November 2025

Hidden-state propagation is the process by which latent variables are updated and transmitted across time, layers, or computational stages in various models.
Techniques range from deterministic recurrences in classical RNNs and LSTMs to stochastic filtering and message passing in probabilistic and graphical models.
Recent advances, including dense hidden connections and information-bottleneck methods, enhance model stability, interpretability, and resistance to gradient issues.

Hidden-state propagation refers to the mechanisms by which latent variables or activations—termed "hidden states"—are updated, transmitted, or approximated across time, layers, or computational stages in probabilistic models, dynamical systems, and neural networks. The precise method of propagation governs both the expressivity and stability of learning, the tractability of inference, and the interpretability of model internals. Modern approaches encompass deterministic recurrences (as in classical RNNs), stochastic filtering, structured message passing, explicit uncertainty quantification, and information-theoretic metrics for the complexity of hidden-state evolution.

1. Hidden-State Propagation Mechanisms in Recurrent and State Space Models

Hidden-state propagation in neural architectures defines the evolution of internal memory vectors in response to incoming data. Classical recurrent neural networks (RNNs), long short-term memory units (LSTMs), and state space models (SSMs) instantiate distinct propagation rules:

Classical RNNs: The update

$h_t = f(h_{t-1}, x_t)$

defines the hidden state as a function (typically affine + nonlinearity) of the previous state and new input (Ebrahimi et al., 27 May 2025).

Shuffling RNNs: The SRNN propagates hidden state via permutation and addition:

$h_t = \sigma(W_p h_{t-1} + b(x_t)), \quad o_t = s(h_t)$

where $W_p$ is an orthogonal permutation (cyclic shift), and $b(x_t)$ is a gated MLP (Rotman et al., 2020).

State Space Models (SSM):

$h_t = A h_{t-1} + B x_t$

where $A,B$ are system matrices (possibly time- or input-dependent) (He et al., 26 Feb 2024).

DenseSSM augments SSMs by directly integrating shallower-layer states into the current hidden state through selective projection and gating, allowing deeper layers to access fine-grained earlier representations, enhancing both information retention and model accuracy (He et al., 26 Feb 2024).

2. Theoretical Properties and Hierarchies of Hidden-State Update Rules

Theoretical analysis of hidden-state propagation mechanisms centers on their capacity for information retention, expressive power, and gradient stability.

Bi-linear RNNs implement updates of the form:

$h_t = A_{x_t} h_{t-1}$

where $A_{x_t}$ is an input-dependent matrix, parameterized by a 3-way tensor $W$ :

$(A_{x_t})_{i,j} = \sum_{k=1}^D W_{i,j,k} x_{t,k}$

This structure enables simulation of arbitrary finite-state machines (FSMs), with constrained forms (block-diagonal, orthogonal, etc.) mapping to group-theoretic hierarchies (e.g., abelian groups, parity) (Ebrahimi et al., 27 May 2025).

Stability and Gradient Propagation: In SRNNs, the orthogonality of $W_p$ and the boundedness of nonlinearities ensure that:

$\left\| \frac{\partial h_t}{\partial b_k} \right\| \leq \sum_{i=1}^t \left\| \frac{\partial b(x_i)}{\partial b_k} \right\|$

precluding both vanishing and exploding gradients (no exponential scaling with sequence length) (Rotman et al., 2020).

Persistent Hidden-State Semantics: Removing affine transforms (e.g., in PRU/PRU $^+$ ) ensures that each dimension of the hidden state retains consistent semantic meaning over time, rather than undergoing arbitrary rotations/reflections as in standard LSTMs. Explicit feedforward augmentation restores nonlinearity without sacrificing semantic persistence (Choi, 2018).

3. Hidden-State Propagation in Probabilistic and Filtering Models

In latent variable models and state-space filtering, hidden-state propagation formalizes Bayesian updating:

Kalman Filter: For a linear-Gaussian state-space model, the hidden-state estimator propagates according to:

$\hat{x}_{k|k-1} = A_{\theta_0} \hat{x}_{k-1|k-1} + u_k(\theta_0)$

with analytic propagation of parameter bias via:

$\delta_k = (I - K_k C_{\theta_0})A_{\theta_0} \delta_{k-1} + M_k \Delta\theta + o(\|\Delta\theta\|)$

quantifying how parameter uncertainty or initial errors are transmitted into state estimates (Kolei, 2013).

Continuous Particle Filtering LSTM: The CPF-LSTM maintains a particle approximation to the hidden-state distribution, propagating particles through stochastic, differentiable updates and summarizing via the empirical mean for downstream tasks, with end-to-end differentiability ensured via smooth reparameterization steps (Li, 2022).
Robust HMM Filtering: Filtering in continuous-time finite-state HMMs with unknown, time-varying parameters relies on rough-path differential equations (RDEs) for hidden-state distribution evolution, with explicit Lipschitz continuity bounds on propagation of path and parameter uncertainty (Allan, 2020).

4. Message Passing, Memory, and Information Retention Strategies

Emergent advances target explicit propagation of hidden-state information across time and across batches or layers, addressing limitations of standard sequential models.

Message Propagation Through Time (MPTT): MPTT introduces key-map and state-map memory modules outside of the computational graph, synchronizing initial hidden states for subsequences via asynchronous message passing and learned filtering policies (read, write, propagate). This achieves robust training under random shuffling while preserving long temporal dependencies, outperforming traditional RNN mini-batch and stateful training strategies in real-world sequence modeling tasks (Xu et al., 2023).
Dense Hidden Connections (DenseSSM): By fusing selectively projected and gated shallower hidden states into current deeper layers, DenseSSM architectures counteract information decay in stacked SSMs (e.g., Mamba, RetNet), leading to enhanced reasoning and language modeling performance with minimal additional computational overhead (He et al., 26 Feb 2024).

5. Hidden-State Propagation in Probabilistic Graphical Models

The concept of hidden-state propagation generalizes to inference in graphical models with latent variables, where belief propagation and related algorithms exploit the structure of interactions:

Kinetic Ising Model with Hidden Spins: An efficient EM workflow combines belief propagation (BP) for marginal inference on hidden spins with susceptibility propagation (SusP) for computing cross-covariances necessary for coupling updates. The absence of hidden–hidden couplings allows conditional independence, enabling replica tricks and exact BP-style message passing for propagating hidden-state beliefs (Battistin et al., 2014).
HMMs with Unobservable (ε-) Transitions: For probabilistic systems with hidden states and silent transitions, forward–backward (α–β) propagation is defined via ε-closure matrices satisfying

$E^a = Q^a + Q^{\epsilon} E^a,$

ensuring correct traversal of all possible hidden paths through null transitions. This approach admits robust Viterbi-style decoding and parameter learning for models with arbitrarily deep hidden stochastic structure (Bernemann et al., 2022).

6. Information-Theoretic and Complexity Perspectives

Recent work quantifies the "interestingness" or computational richness of hidden-state updates through information-theoretic metrics:

Prediction-of-Hidden-States (PHi) Bottleneck: By augmenting neural sequence models with a learned predictive prior over hidden states, the per-step KL-divergence $D_{\mathrm{KL}}[q(z_t|h_t) \parallel p(z_t|z_{<t})]$ quantifies the novel information introduced at each computation step. This "hidden-state description length" aligns with task complexity, mathematical problem difficulty, and the correctness of reasoning chains, exceeding next-token loss in sensitivity to nontrivial in-context computation (Herrmann et al., 17 Mar 2025).

7. Inference of Hidden-State Topology and Memory in Projected Dynamical Systems

For projected or marginalized Markov processes, hidden-state propagation can be studied by empirical analysis of observed transition statistics:

Markov-State Holography: By constructing history-conditioned histograms of observable transition probabilities, one identifies the fingerprint of hidden states and quantifies local memory duration through the convergence of these distributions. Analysis of the limiting shapes and rates provides a data-driven method to reconstruct or refine Markov-state models consistent with all observed transition histories, revealing latent structure without prior assumptions on hidden-state topology (Zhao et al., 14 Mar 2025).

These frameworks collectively establish hidden-state propagation as a unifying theme across sequential modeling, control, graphical inference, and complexity analysis. The details of propagation—linear, orthogonal, bi-linear, particle-distributed, or information-bottlenecked—govern not only learning efficiency and task performance but also model interpretability and robustness to uncertainty.