Active Inference

Updated 3 September 2025

Active Inference is a Bayesian framework that unifies perception, learning, planning, and action selection by minimizing expected free energy.
It integrates risk minimization to align behaviors with prior preferences and ambiguity reduction to guide epistemic exploration.
Its computational implementations use variational inference and neural architectures, making it applicable to neuroscience, reinforcement learning, and robotics.

Active inference is a formal, normative Bayesian framework originating in neuroscience that unifies perception, learning, planning, and action selection by minimizing expected free energy. In contrast to canonical models of agency that optimize scalar reward signals or expected utility, active inference operationalizes adaptive behavior as minimizing a single functional—expected free energy—which quantifies both the divergence of predicted outcomes from prior preferences (risk or exploitation) and the expected ambiguity or uncertainty in future observations (epistemic exploration). The framework has achieved prominence for its ability to simultaneously explain exploratory and exploitative behaviors in biological agents, its application to model-based reinforcement learning, and its universal potential to reinterpret a broad class of RL algorithms. This entry provides a comprehensive synthesis of active inference, summarizing foundational principles, mathematical structure, computational strategies, and representative applications, as unified by recent literature.

1. Principles and Mathematical Foundations

Active inference posits that an agent maintains a generative model $P(s, o | h_t, a_{t+1:T})$ over external states $s$ , sensory observations $o$ , and action sequences $a_{t+1:T}$ , often formulated as a POMDP. Agency is defined not in terms of reward maximization, but as the minimization of expected free energy (EFE), which integrates both goal-directed and information-seeking drives under a single probabilistic imperative. Formally, for a given action sequence $a_{t+1:T}$ and history $h_t$ :

$-\log P(a_{t+1:T}|h_t) = \mathbb{E}_{P(s,o|a_{t+1:T}, h_t)}\left[\log P(s|a_{t+1:T}, h_t) - \log P(s,o | h_t)\right]$

The EFE can be decomposed into risk (extrinsic value) and ambiguity (intrinsic/epistemic value):

Risk (exploitation):

$D_{\mathrm{KL}}[P(s | a_{t+1:T}, h_t) \Vert P(s | h_t)]$

quantifies the divergence between predicted state distributions under a policy and prior/prefential states.

Ambiguity (exploration):

$\mathbb{E}_{P(s | a_{t+1:T}, h_t)} [ H [ P(o|s, h_t)] ]$

measures expected entropy over outcomes given states, favoring actions that reduce uncertainty about latent causes.

Policy selection emerges via a softmax (Boltzmann) rule, typically over negative expected free energies: $p(\pi) = \sigma(-\gamma G^\pi)$ where $\gamma$ is a precision parameter.

2. Variational Free Energy and Learning

Perceptual and model learning are formulated as variational inference. The variational free energy (VFE) provides a tractable upper bound on the negative log evidence of new observations:

$F[Q(s)] = D_{\mathrm{KL}}[Q(s) \Vert P(s)] - \mathbb{E}_{Q(s)}[\ln P(o|s)]$

Agents maintain a variational posterior $Q(s)$ over latent states and minimize $F[Q(s)]$ with respect to both beliefs (Bayesian state estimation) and generative model parameters. This learning is generalized in multi-level or hierarchical models (e.g., for context variables, see (Prakki, 30 Sep 2024)), and may be implemented in both discrete and continuous time (e.g., through Laplace or mean-field approximations, or amortized using neural networks (Millidge, 2019, Tschantz et al., 2019)).

Model parameters governing emission and transition mappings (typically denoted as matrices or neural nets) are updated via conjugate prior rules or gradient-based methods, supporting continual learning and adaptation to dynamic environments (Prakki, 30 Sep 2024).

3. Policy Evaluation, Planning, and Action Selection

Policy selection in active inference is framed as Bayesian model selection over policies, optimizing those that minimize expected free energy into the future. For policy $\pi$ at future step $t$ :

$G_t(\pi) = \mathbb{E}_{Q(s_t|\pi)}[\text{Ambiguity} - A\text{-Novelty} + \text{Risk} - B\text{-Novelty}]$

where ambiguity quantifies outcome uncertainty, and $A$ -novelty/ $B$ -novelty are information gain terms about the generative model's emission and transition parameters, respectively (Torresan et al., 16 Aug 2025, Friston et al., 2020). When the agent's preferences are sharply specified (zero-temperature limit), policy selection via EFE minimization recovers Bellman-optimal reinforcement learning solutions (Costa et al., 2020).

In practice, exact computation over all policies is intractable, especially in high dimensions. Deep active inference replaces tabular policy/state enumerations with neural density estimation—using perception, transition, policy, and value networks—and propagates actions via Monte Carlo rollouts, Cross-Entropy Method (CEM), or variational recursions (Tschantz et al., 2019, Millidge, 2019). Sophisticated recursive EFE computation recursively back-propagates 'beliefs about beliefs' for temporally extended planning (Friston et al., 2020).

4. Integration of Exploration and Exploitation

Active inference provides a canonical, principle-derived solution to the exploration–exploitation dilemma via the joint minimization of risk (divergence from goals) and ambiguity (information-seeking). Unlike traditional RL, which must introduce extrinsic (reward) and intrinsic (e.g., curiosity bonus) signals separately, active inference natively decomposes EFE so that both drives are traded off automatically:

Exploration emerges via the minimization of expected future ambiguity and information gain (e.g., epistemic value terms in EFE and model parameter updates).
Exploitation is implemented by matching predicted outcomes to prior preferences (encoded as a preferred distribution over outcomes or states).

Empirical demonstrations show that including both components enhances exploration in sparse-reward or ambiguous domains (Tschantz et al., 2019, Tschantz et al., 2020), while removing either epistemic or entropy regularization results in degraded performance and unstable policies (Millidge, 2019).

5. Computational Architecture and Algorithmic Realizations

Active inference is implemented computationally via parameterized generative models (often neural), variational inference for belief propagation, and policy search. Key architectural components include:

Function	Typical Implementation	Purpose
Perception	Encoder–decoder neural nets	Maps raw observations to latent states and vice versa
Transition	Recurrent or feedforward nets	Models state transitions under actions
Policy Net	Stochastic/softmax action net	Defines posterior over actions given states
EFE/Value	Bootstrapped value network	Approximates expected free energy recursively

Monte Carlo rollouts, value bootstrapping, amortized recognition (inference) networks, and cross-entropy method are often utilized to efficiently estimate policy values and action proposals in large state spaces (Tschantz et al., 2019, Millidge, 2019).

Algorithmic robustness increases substantially by incorporating bootstrapped EFE-value updates, entropy regularization, and targeted handling of epistemic/extrinsic terms, especially in high-dimensional and non-stationary contexts.

Active inference generalizes and subsumes a variety of decision and control frameworks. Bellman-optimal policies in RL are recovered as a special case of EFE minimization under sharp preferences (Costa et al., 2020). Active inference can be viewed as a specific instance of control as inference (CaI), with key distinctions in how value is encoded—AIF absorbs goal structure into the generative model prior (reward as biased observations), while CaI uses auxiliary optimality variables (Millidge et al., 2020, Watson et al., 2020).

This tight relationship permits reinterpretation or reformulation of existing RL methods as active inference under appropriate generative model assumptions (Costa et al., 23 Jan 2024). The framework also naturally implements optimal Bayesian experimental design and information gain maximization by modulation of preference structure (Sajid et al., 2021).

Extensions to hybrid or contrastive objectives (e.g., contrastive AIF, (Mazzaglia et al., 2021)) and modular message-passing for chance-constrained design (Laar et al., 2021) further broaden the theoretical and applied reach of the paradigm.

7. Applications, Limitations, and Frontiers

Applications of active inference include:

Model-based RL benchmarks: Achieving or exceeding performance of standard RL methods on tasks such as CartPole, Acrobot, LunarLander, and continuous control domains (Millidge, 2019, Tschantz et al., 2019).
Robotics and adaptive control: End-to-end robot navigation with high-dimensional sensory streams, continuous learning in non-stationary environments, and resilient health monitoring in active digital twins (Çatal et al., 2020, Lanillos et al., 2021, Torzoni et al., 17 Jun 2025).
Biological cognition: Modeling of animal and human decision-making, self–other distinction, and hierarchical planning under uncertainty.
Statistical inference and active data collection: Application of active sampling and informative labeling for efficient inferential tasks using ML predictors (Zrnic et al., 5 Mar 2024).

Key limitations include tractability in very high-dimensional discrete spaces, the dependence on well-calibrated generative models, and unresolved issues in tuning precision parameters or efficiently integrating actions as variables in the generative model (Prakki, 30 Sep 2024, Lanillos et al., 2021). Open research areas include biologically plausible neural dynamics, scalable planning with non-stationary or partially observable data, and the ongoing refinement of model parameter learning in lifelong settings.

In summary, active inference provides a comprehensive, mathematically unified framework for agency in uncertain environments, integrating perception, learning, exploration, and goal-directed planning under the prescription of variational free energy minimization. Its relevance spans computational neuroscience, machine learning, robotics, and scientific inference, and it is positioned as a universal, explainable, and adaptive alternative to modular, reward-based control paradigms.