Hierarchical Gaussian Filters

Updated 21 February 2026

Hierarchical Gaussian Filters are hierarchical Bayesian models that employ recursive inference and message passing to quantify hidden states, volatility, and uncertainty.
They use variational Bayes and nonlinear coupling techniques to update predictions and compute precision-weighted errors in discrete trial steps.
Their modular architecture supports flexible implementations and computational phenotyping by dissociating value and volatility updates.

A Hierarchical Gaussian Filter (HGF) is a class of hierarchical Bayesian models used to describe the inference and learning processes underlying perception and adaptation. Originally developed in cognitive neuroscience, the HGF formalizes how agents recursively infer hidden states, their volatility, and uncertainty in generative environments. The generalized HGF expands the classical formulation to allow for nonlinear coupling between levels, enabling both volatility-driven and mean-driven learning, and supporting fully modular and flexible computational architectures for empirical data analysis (Weber et al., 2023).

1. Hierarchical Generative Structure

The generative model in the generalized HGF defines a hierarchy of hidden states, indexed by level $i=1,\ldots,N$ and discrete trial $k$ :

$x_1(k)$ : first-level hidden state
$x_2(k)$ : second-level hidden state
$\ldots$
$x_N(k)$ : top-level hidden state

Each state evolves according to a Gaussian random walk, whose drift (mean increment) and volatility (variance) are conditional on the previous states and parent nodes. For a child node $a$ at time $k$ , with value parents $\mathcal P(a)$ and volatility parents $\mathcal V(a)$ , the transition is:

$x_a(k) \sim \mathcal N \left( x_a(k-1) + t(k)\left[p_a + \textstyle\sum_{i\in \mathcal P(a)}\alpha_{i,a}g\left(x_i(k-1)\right)\right],\; t(k)\exp\Bigl[w_a + \textstyle\sum_{j\in \mathcal V(a)}\kappa_{j,a} x_j(k-1)\Bigr]\right)$

where $p_a$ (tonic drift) and $w_a$ (tonic log-volatility) are node parameters, $\alpha_{i,a}$ and $\kappa_{j,a}$ control the influence of value or volatility coupling, and $g(\cdot)$ is a twice-differentiable, possibly nonlinear coupling function.

This structure accommodates a spectrum of models:

Pure volatility-coupling (classical HGF): $g(x)=0$
Pure value-coupling (predictive coding): $w_a$ constant, no volatility parents
Arbitrary additive parent combinations

2. Variational Inference and Update Equations

Inference in the HGF proceeds via variational Bayes, employing a mean-field Gaussian approximation for each hidden node: $x_i(k)\sim\mathcal N(\mu_i(k),\,\Pi_i^{-1}(k))$ , with $\mu_i$ and $\Pi_i$ denoting mean and precision, respectively. Each node executes three generic steps on each trial:

Prediction using priors and parents
Update based on current evidence
Computation of prediction error (PE) for propagation

For a child $a$ with parent $b$ (value coupling), the PE is

$\delta_a(k) = x_a(k) - \left[\mu_a(k-1) + p_a + \sum_{i\in \mathcal P(a)}\alpha_{i,a}g(\mu_i(k-1))\right]$

Parent $b$ updates via: $\begin{aligned} \Pi_b(k) &= \Pi_b(k-1) + \Pi_a(k) [g'(\mu_b(k-1))]^2 \ \mu_b(k) &= \mu_b(k-1) + \Pi_b^{-1}(k) \Pi_a(k) g'(\mu_b(k-1)) \delta_a(k) \end{aligned}$ where $g'(x)$ is the derivative of the coupling function.

For volatility parents, $\varepsilon_a(k) = \ln\Pi_a(k) - \ln\hat\Pi_a(k)$ and

$\Upsilon_a(k) = \hat\Pi_a(k)\Pi_a(k), \quad \hat\Pi_a(k) = \exp(w_a + \kappa_{a,a} \tilde x_a(k-1))$

Updating level $\tilde a$ gives: $\begin{aligned} \Pi_{\tilde a}(k) &= \Pi_{\tilde a}(k-1) + \tfrac{1}{2}\kappa_{a,a}^2 \Upsilon_a(k) - \tfrac{1}{2}\kappa_{a,a}^2 \Pi_a(k)\ \tilde\mu_a(k) &= \tilde\mu_a(k-1) + \Pi_{\tilde a}^{-1}(k) \kappa_{a,a}\Upsilon_a(k)\varepsilon_a(k) \end{aligned}$

These analytic one-step updates encode hierarchical precision-weighted prediction error processing, critical both for computational modeling and interpretation of agent learning.

3. Nonlinear Coupling Extensions

The generalized HGF incorporates arbitrary twice-differentiable coupling functions $g(x)$ for node-to-node interactions beyond standard linear forms.

Value coupling: $g(x)$ shifts the mean of child nodes (e.g., $g(x)=\max(0,x)$ , a ReLU; thus $g'(x)$ gates learning such that updates occur only above threshold),
Rate–state (volatility) coupling: $g(x)$ directly modulates log-volatility of descendants.

For nonlinear coupling, the derivatives $g'(x)$ and $g''(x)$ enter into the update equations, adapting learning rates and message passing architecture accordingly. For instance, nonlinear gating introduces context-dependent “switching,” so that learning is active only under specific parental state conditions.

4. Relation to Predictive Coding and Message Passing

Both HGF and predictive coding instantiate hierarchical Bayesian belief updating with precision-weighted prediction errors; however, architectural and inferential differences exist:

	Predictive Coding	Hierarchical Gaussian Filter
Temporal Mode	Continuous-time	Discrete one-step trial updates
PEs	Value prediction errors	Both value ( $\delta_a$ ) and volatility ( $\varepsilon_a$ ) PEs
Dynamics	Means only or with gain	Means and precisions explicitly tracked at all levels
Hierarchy of Errors	Single, upward	Dual (value and volatility), modular message passing
Nonlinearity	Rare, implicit	Explicit through $g$ , $g'$ , $g''$

The generalized HGF extends predictive coding frameworks by introducing explicit volatility/precision updates, parallel error signals, and modular node-based architectures. Neurobiologically, this predicts an additional ascending information stream dedicated to transmitting dynamically updated precision estimates (Weber et al., 2023).

5. Unique Model Properties and Behavioral Implications

The generalized HGF posits several distinct predictions about learning and behavior:

Agents concurrently infer hidden state, its volatility, and possibly observation noise, supporting three-way arbitration not accessible in single-level learning models.
Nonlinear gating through $g'$ implements context-sensitive switches in learning (e.g., blocking updates below state-dependent thresholds).
Shared parents for multiple lower-level nodes facilitate the imposition of global versus local volatility estimates, which predicts cross-modal or cross-channel learning rate correlations.
The architecture dissociates value PEs from volatility PEs, predicting separable EEG/MEG/fMRI observables for each error type.
Individual variability in coupling parameters $(\alpha, \kappa)$ predicts subject-specific learning profiles, providing a basis for computational phenotyping in clinical populations, including autism and schizophrenia.

6. Modular Implementation and Practical Computation

The generalized HGF is organized as a node-based architecture:

Node: Encapsulates one hidden state $x_a$ with its posterior $(\mu_a, \Pi_a)$ , parenthood structure, and coupling type (value, volatility, or noise).
Architecture construction: Define the directed graph of nodes, assign parameters $\{p_a,w_a,\alpha_{i,a},\kappa_{j,a}\}$ , and specify initial priors.
Run loop: Iterates over trials, with nodes updating their predictions, posteriors, and sending PE messages according to incoming and outgoing edges.
Fitting: Parameters are typically estimated by either variational Bayes or MCMC, maximizing model evidence over the full agent and response model.

This modular, message-passing structure supports dynamic addition/removal of levels, flexible switching of nonlinear couplings, and easy introduction of noise-learning nodes without re-derivation of update rules. Example open-source implementations include pyhgf (Python), HierarchicalGaussianFiltering.jl (Julia), and prospective inclusion in the TAPAS toolbox (Weber et al., 2023).

7. References and Historical Development

The conceptual foundations of the HGF trace to Mathys et al. (2011, 2014). The generalized HGF and its formal message-passing architecture were established in Weber et al. (2023) (Weber et al., 2023). Related computational frameworks include predictive coding (Friston, 2005; Rao & Ballard, 1999). The generalized HGF unifies and extends both volatility-driven (classical HGF) and mean-driven (predictive coding) approaches, while providing a parameterized, modular, and computationally tractable framework for hierarchical Bayesian inference in complex environments.

Markdown Report Issue Upgrade to Chat

References (1)

The generalized Hierarchical Gaussian Filter (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Gaussian Filters.