Discrete-State Active Inference

Updated 2 December 2025

Active inference is a probabilistic framework that uses variational free energy minimization to perform perception, action, and learning with discrete state and observation sets.
It unifies Bayesian filtering, optimal control, and exploration by evaluating candidate policies through expected free energy and message-passing algorithms.
Parameter learning is achieved via Dirichlet–Categorical updates that adapt model parameters in nonstationary environments, supporting scalable and hierarchical implementations.

Active inference in discrete state spaces is a rigorous inferential framework for perception, action, and learning, grounded in the minimization of variational free energy within probabilistic graphical models. In discrete settings, both states and observations are finite sets, generative models factorize over categorical transitions and emissions, and planning is formalized via expected free energy functionals over policies. This unifies Bayesian filtering, optimal control, and exploration under a single objective, accommodating parameter learning via conjugate Bayesian updates and scaling to models with structured, hierarchical, or learned representations.

1. Discrete Generative Models: Parametric Structure and Factorization

The generative model underlying discrete-state active inference is most commonly instantiated as a partially observable Markov decision process (POMDP), factorized as:

$p(o_{1:T},\,s_{1:T},\,\pi) = p(\pi)\,p(s_1)\prod_{t=1}^T p(o_t|s_t)\prod_{t=2}^T p(s_t|s_{t-1},a_{t-1}=\pi(t-1))$

Here, $s_t \in \mathcal{S} = \{1,\dotsc,N_s\}$ are discrete hidden states, $o_t \in \mathcal{O} = \{1,\dotsc,N_o\}$ are observations, actions $a_t \in \mathcal{A}$ , and $\pi = (a_1,\dotsc,a_T)$ is a policy—a sequence of planned actions (Sajid et al., 2019, Oostrum et al., 11 Jun 2024). The model is specified by:

Observation likelihood matrices: $A_{o,s} = p(o|s)$ .
Transition matrices: $B^{(a)}_{s',s} = p(s'|s,a)$ , one for each action.
Initial state prior: $D_s = p(s_1 = s)$ .
Prior over policies: $p(\pi)$ , often uniform or softmax over energy/preference.
Preference (goal) encoding via $C_o = -\ln p(o|C)$ , manipulating outcome utility.

This structure readily supports factorial states, multiple observation modalities, and hierarchical compositions (Heins et al., 2022, Costa et al., 2020).

2. Variational Inference, Free Energy, and Belief Updating

Active inference formalizes state estimation and policy selection as variational inference, introducing a mean-field variational posterior:

$q(s_{1:T},\pi) = q(\pi)\prod_{t=1}^T q(s_t)$

The variational free energy functional is

$F[q] = \mathbb{E}_q\big[\ln q(s_{1:T},\pi) - \ln p(o_{1:T}, s_{1:T}, \pi)\big]$

Minimizing $F[q]$ corresponds to approximate Bayesian inference over latent states and policy, producing a tractable estimate of the marginal posterior $p(s_{1:T},\pi|o_{1:T})$ (Oostrum et al., 11 Jun 2024, Sajid et al., 2019). The factorization yields concrete update equations:

$q(s_t) \propto \exp\biggl\{\ln A_{o_t,s_t} + \sum_{j} q(s_{t-1} = j) \ln B^{(a_{t-1})}_{s_t,j} + \sum_{k} q(s_{t+1}=k) \ln B^{(a_t)}_{k,s_t}\biggr\}$

$q(\pi) \propto p(\pi) \exp(-G(\pi))$

where $G(\pi)$ is the expected free energy for planning (see below), and the update equations are solved via forward–backward message-passing or coordinate ascent (Oostrum et al., 11 Jun 2024, Costa et al., 2020, Champion et al., 2021).

3. Expected Free Energy and Policy Selection

Planning and action selection are cast as Bayesian model comparison across candidate policies, evaluated by the expected free energy (EFE):

$G(\pi) = \mathbb{E}_{q(o_{t+1:T},\,s_{t+1:T}\mid\pi)}\left[\ln q(s_{t+1:T}|\pi) - \ln p(o_{t+1:T},s_{t+1:T}| \pi)\right]$

This admits multiple, interchangeable decompositions:

Epistemic–pragmatic decomposition:

$G(\pi) = -\mathbb{E}_{q(o|\pi)}[D_{KL}[q(s|o,\pi)||q(s|\pi)]] + \mathbb{E}_{q(o|\pi)}[-\ln p_C(o)]$

where the first (epistemic) term incentivizes policies that maximize information gain (uncertainty reduction), and the second (utility) term aligns predicted outcomes with preferences (Sajid et al., 2019, Prakki, 30 Sep 2024).

Risk–ambiguity decomposition:

$G(\pi) = D_{KL}[q(o|\pi) || p_C(o)] + \mathbb{E}_{q(s|\pi)}[H[p(o|s)]]$

Optimal policies are selected via:

$q(\pi) \propto p(\pi) \exp(-G(\pi))$

$a^\ast = \pi^\ast_1, \quad \pi^\ast = \arg\max_\pi q(\pi)$

This mechanism unifies exploration and exploitation: maximal information gain and preference satisfaction are balanced intrinsically (Costa et al., 2020, Sajid et al., 2019).

4. Parameter Learning: Dirichlet–Categorical Bayesian Updates

Learning of the generative model parameters (A, B, D) is accomplished via Dirichlet–Categorical conjugate updates. For each observed trial:

Observations increment $A$ concentration parameters proportionally to marginal $q(s_t)$ .
Transitions increment $B$ concentration parameters according to $q(s_{t},s_{t-1})$ and the action taken.

The update rule for the likelihood parameters, e.g.,

$\tilde \alpha^A_{o,i} \mapsto \tilde \alpha^A_{o,i} + \sum_{t=1}^T \delta_{o_t = o} q(s_t = i)$

and analogously for $B$ , ensures online, continual learning and accommodates "forgetting" via exponential decay when tracking nonstationary environments (Prakki, 30 Sep 2024, Costa et al., 2020).

The posterior predictive parameters for action and observation selection are the expected value of the Dirichlet, i.e., normalized concentration parameters.

5. Algorithmic Realizations and Software Implementations

A canonical algorithmic loop in discrete active inference involves:

Perception: Update $q(s_t)$ from current $o_t$ and past $q(s_{t-1})$ .
Policy Evaluation: For each candidate policy $\pi$ , roll forward beliefs, compute $G(\pi)$ .
Policy Inference: Softmax $(-G(\pi))$ to form $q(\pi)$ , sample or select $\pi^*$ .
Action: Execute $a_t = \pi^\ast_1$ .
Learning: After full episode, increment Dirichlet counts for $A$ , $B$ , $D$ using the smoothed state and transition posteriors.

This pattern is reproducible in open-source libraries such as pymdp (Heins et al., 2022). Minimal code demonstrates instantiation of the generative model, state and policy inference, and action selection in vectorized form, and is directly compatible with environments that provide discrete state, action, and observation sets.

6. Hybrid, Hierarchical, and Scalable Discrete Active Inference Architectures

Recent research extends discrete-state active inference in two key directions:

Hierarchical integration: Discrete planners can be chained atop continuous controllers, using abstractions (e.g., learned modes via rSLDS or options) as "macro-actions." Skills or subgoals are composed by discrete POMDP planning, relayed as preferences to lower controllers (Collis et al., 2 Sep 2024, Pezzato et al., 23 Jul 2025). This supports planning over long horizons, reusable skill libraries, and efficient adaptation in robotics and sensorimotor settings.
Learned model structure: The structure and parameters of generative models can be inferred online. Tensor network approaches enable compact, learned discrete representations that scale with the bond dimension of the tensor, facilitating efficient inference and accommodating large, complex state spaces (Wauthier et al., 2022).
Policy search scalability: For large policy spaces, embedding and clustering approaches (e.g., k-means over policy representations) allow pruning the policy search, thus accelerating EFE computation while retaining near-optimality in challenging discrete control problems (Kiefer et al., 2022).

7. Distinctions, Limitations, and Research Challenges

A salient feature of discrete-state active inference is the explicit representation of epistemic (intrinsic) and pragmatic (extrinsic) motives within a single variational objective, eliminating the need for hand-crafted exploration bonuses or engineered reward signals typical in RL (Sajid et al., 2019, Prakki, 30 Sep 2024). Unlike reinforcement learning, reward is treated as an ordinary observation channel, and preferences are encoded via priors over expected observations.

Nevertheless, open challenges remain:

Scalability: Enumeration of policies becomes intractable for large horizons or high-dimensional state–action spaces, motivating amortization, pruning, or sampling-based policy evaluation.
Structure learning and abstraction: While discrete models are amenable to principled Dirichlet learning and modular composition, automatic discovery of optimal abstractions, hierarchical structures, or state-factorization remains active research (Costa et al., 2020, Oostrum et al., 11 Jun 2024).
Integration with deep/continuous models: Direct discrete parameterization is tractable but may incur expressivity limits; hybrid models and neural parameterizations are being developed (Millidge, 2019, Çatal et al., 2020, Wauthier et al., 2022).

Discrete-state active inference constitutes a mathematically mature, algorithmically concrete paradigm accommodating learning, planning, and adaptation in challenging stochastic environments (Kenny, 25 Nov 2025, Champion et al., 2021, Costa et al., 2020). Its flexibility, empirical grounding in Bayesian optimality, and increasing computational scalability position it as a central framework in modern computational neuroscience and artificial intelligence.