Active Inference & Free-Energy Principle

Updated 8 February 2026

Active inference is a Bayesian framework that unifies perception and action by minimizing variational free energy, balancing exploration and exploitation.
It decomposes expected free energy into risk and ambiguity, providing a principled method for goal-directed planning and optimal control.
The approach bridges reinforcement learning and optimal control by transforming value functions into softmax policy selections through variational updates.

Active inference is a normative Bayesian framework for modeling agency, perception, and action, grounded in the Free-Energy Principle (FEP). The FEP asserts that any adaptive system minimizes a variational bound on sensory surprise, formalized as (expected) variational free energy, under an internal generative model of its environment. Active inference refines the FEP by explicitly encoding agent preferences in the form of prior distributions and extends the interpretation from passive perception to goal-directed action selection. This unification yields a theoretically principled account of the exploration–exploitation trade-off and provides a general recipe for control and reinforcement learning as variational inference.

1. Variational Free Energy: Foundations and Formulation

At the core of both the FEP and active inference is the variational free-energy functional, which serves as a tractable upper bound on surprise (negative log-evidence) for observations $o$ under the agent's generative model $p(o, s, \pi)$ , where $s$ are latent states and $\pi$ denotes policies (sequences of actions). Given a variational (recognition) density $q(s,\pi)$ , variational free energy is defined as

$F[q(s,\pi)] = E_{q(s,\pi)}[\ln q(s,\pi) - \ln p(o, s, \pi)] = \mathrm{KL}[q(s,\pi) || p(s,\pi|o)] - \ln p(o)$

Minimizing $F$ with respect to $q$ is formally equivalent to minimizing surprise $-\ln p(o)$ , as $F \geq -\ln p(o)$ by construction (Costa et al., 2024).

The variational free energy decomposes into:

Complexity: $\mathrm{KL}[q(s,\pi) || p(s,\pi)]$ , penalizing divergence from the agent's prior,
Accuracy: $E_{q(s,\pi)}[\ln p(o|s)]$ , rewarding beliefs consistent with the observed data.

This variational Bayesian framework subsumes classical inference and supplies an operational principle for both perception (belief updating) and action (policy selection) (Shin et al., 2021, McGregor et al., 2015).

2. Expected Free Energy and Its Decomposition

Active inference introduces the expected free energy (EFE) as the central prospective objective for action selection under future uncertainty: $G(\pi) := E_{p(s,o | \pi)} \big[ \ln p(s | \pi) - \ln p(s,o) \big]$ This policy-dependent functional admits a decomposition into two key terms:

Risk: $\mathrm{KL}[p(s|\pi) \Vert p(s)]$ , quantifying divergence of predicted state-trajectories under $\pi$ from prior preferences $p(s)$ (exploitation),
Ambiguity: $E_{p(s|\pi)}[H(p(o|s))]$ , the expected entropy of observations conditioned on states (exploration).

Alternatively, EFE can be written as: $G(\pi) = -E_{p(o|\pi)}[\ln p(o)] - E_{p(o|\pi)}[\mathrm{KL}(p(s|o,\pi)\Vert p(s|\pi))]$ Here, the first term is extrinsic value (log-likelihood of achieving preferred outcomes), and the second is intrinsic value (expected information gain about states) (Costa et al., 2024, Shin et al., 2021, Sajid et al., 2021). This decomposition integrates goal-directed and information-seeking behavior without ad hoc exploration bonuses.

3. Generative Models, Posterior Factorization, and Variational Updates

The generative model is structured as: $p(o, s, \pi) = p(o|s) p(s|\pi) p(\pi)$ The variational posterior adopts a mean-field factorization: $q(s, \pi) = q(\pi) q(s|\pi)$ Variational updates proceed by coordinate descent:

Perceptual update: For each $\pi$ , infer $q(s|\pi) \propto p(o|s) p(s|\pi)$ ,
Planning update: $q(\pi) \propto p(\pi) \exp\{-G(\pi)\}$ .

Thus, posterior beliefs over policies are "soft-maxed" in the (negative) expected free-energy landscape. This structure is consistent with sum-product and message-passing algorithms in factor graphs and admits both discrete-state (Laar et al., 2021, Sajid et al., 2019) and deep neural implementations (Ueltzhöffer, 2017, Mazzaglia et al., 2022).

4. Policy Selection via Minimization of Expected Free Energy

Active inference selects actions by scoring policy candidates according to their expected free energy:

For each candidate sequence $\pi$ , compute $G(\pi)$ ,
Set $q(\pi) \propto p(\pi) \exp[-G(\pi)]$ ,
Execute the first action of $\pi^* = \arg\min_\pi G(\pi)$ .

Belief updates after each observation implement a receding-horizon planning scheme, where the policy is continually revised as new sensory data is assimilated (Costa et al., 2024, Shin et al., 2021).

5. Connections to Reinforcement Learning and Optimal Control

Active inference generalizes reinforcement learning (RL) by replacing value functions with functionals of Bayesian beliefs:

For deterministic-reward tasks with reward $r(s,a)$ encoded as $p(o = r | s,a) \propto \exp[r(s,a)]$ , and negligible ambiguity, $G(\pi) \simeq -Q(\pi)$ , i.e., EFE becomes a negative value function (Costa et al., 2024, Shin et al., 2021).
Policy selection then coincides with soft-max action selection in entropy-regularized RL.

Additionally, the Bellman recursion for EFE mirrors the dynamic programming recursion for RL: $G^*(s_t) = \min_a E_{p(s_{t+1}|s_t, a) p(o_{t+1}|s_{t+1})} \left[ -\ln \frac{p(s_{t+1}|s_t, a)}{\tilde p(o_{t+1}) q(s_{t+1}|o_{t+1})} + G^*(s_{t+1}) \right]$ Within this formulation, epistemic (information-seeking) and instrumental (goal-reaching) components are additive in the policy objective (Kenny, 25 Nov 2025, Vries et al., 21 Apr 2025, Sajid et al., 2021, Sennesh et al., 2022).

6. Theoretical Implications: Agency, Exploration–Exploitation, and Universality

Agency and Preferences: By embedding preferences $p(s)$ or $p(o)$ explicitly in the generative model, active inference turns the FEP from a descriptive theory of self-organization into a prescriptive theory of agency—providing first-principles explanations of purposeful behavior (Costa et al., 2024, Shin et al., 2021).
Principled Exploration–Exploitation: The EFE decomposition inherently balances reward-seeking with epistemic value, resolving the exploration–exploitation dilemma without the need for externally specified bonuses or schedules. This yields goal-directed curiosity, as seen in T-maze and navigation simulations (Sajid et al., 2021, Laar et al., 2021).
Optimal Feedback Control: The infinite-horizon average-surprise variant of active inference recovers KL-control and path-integral formulations of optimal control. The EFE is structurally analogous to a control Lagrangian with preference costs, and the Bellman equation is a free energy variational optimization (Sennesh et al., 2022, Laar et al., 2019).
Neuroscience and Biophysical Plausibility: Active inference provides a process theory linking predictive coding, variational Bayes, and natural gradient descent in information space. Neural populations encode prediction errors as membrane potentials and expected states as firing rates, following free-energy gradients (Costa et al., 2020, Millidge, 2021, Kim, 2022).
Universality: Any RL algorithm satisfying the descriptive assumptions of active inference (finite horizon, reward map, model-based) can be recast in the active inference framework. This equivalence holds for both model-free and model-based regimes (Costa et al., 2024, Kenny, 25 Nov 2025).

7. Computational and Practical Aspects

Computational schemes for active inference leverage both classical variational mean-field updates and deep learning (VAE, amortized inference), supporting both discrete and continuous state spaces (Ueltzhöffer, 2017, Mazzaglia et al., 2022, Nazemi et al., 23 Mar 2025). Graphical model-based message-passing (Bethe, constrained Bethe, and Forney-style factor graphs) unify inference and control, with scalable algorithms for high-dimensional and partially observable environments (Koudahl et al., 2023, Laar et al., 2021).

Pragmatic implementations of active inference have been applied to:

Control and planning in energy systems, demonstrating robustness under partial observability and data privacy constraints (Nazemi et al., 23 Mar 2025).
Interpretable, non-deep active inference agents for RL tasks in behavioral biology (Pazem et al., 2024).
Efficient model-based and model-free deep RL settings, often matching or surpassing classical methods under appropriate preference learning (Shin et al., 2021, Sajid et al., 2019).

8. Summary Table: Core Quantities in Active Inference

Quantity	Formula	Interpretation
Variational Free Energy $F$	$E_{q(s,\pi)} [\ln q(s,\pi) - \ln p(o, s, \pi)]$	Bound on surprise; minimized for inference
Expected Free Energy $G$	$E_{p(s,o\|\pi)} [\ln p(s\|\pi) - \ln p(s,o)]$	Policy-dependent value; balances risk/ambiguity
Risk (Exploitation)	$\mathrm{KL}[p(s\|\pi)\\|p(s)]$	Expected divergence from preferred states
Ambiguity (Exploration)	$E_{p(s\|\pi)} [H(p(o\|s))]$	Uncertainty in observations under a policy
Intrinsic Value	$-E_{p(o\|\pi)} [\mathrm{KL}(p(s\|o,\pi)\\|p(s\|\pi))]$	Expected information gain about states
Policy Posterior	$q(\pi) \propto p(\pi) \exp[-G(\pi)]$	Softmax over negative expected free energy

References

"Active Inference as a Model of Agency" (Costa et al., 2024)
"Prior Preference Learning from Experts: Designing a Reward with Active Inference" (Shin et al., 2021)
"Deriving time-averaged active inference from control principles" (Sennesh et al., 2022)
"Active Inference and Epistemic Value in Graphical Models" (Laar et al., 2021)
"Neural dynamics under active inference: plausibility and efficiency of information processing" (Costa et al., 2020)
"A Minimal Active Inference Agent" (McGregor et al., 2015)
"Active Inference for Energy Control and Planning in Smart Buildings and Communities" (Nazemi et al., 23 Mar 2025)
"Expected Free Energy-based Planning as Variational Inference" (Vries et al., 21 Apr 2025)
"Active inference, Bayesian optimal design, and expected utility" (Sajid et al., 2021)
"The Free Energy Principle for Perception and Action: A Deep Learning Perspective" (Mazzaglia et al., 2022)
"Sophisticated Inference" (Friston et al., 2020)
"Applications of the Free Energy Principle to Machine Learning and Neuroscience" (Millidge, 2021)
"Application of the Free Energy Principle to Estimation and Control" (Laar et al., 2019)
"Active inference: demystified and compared" (Sajid et al., 2019)
"Realising Synthetic Active Inference Agents, Part I..." (Koudahl et al., 2023)
"Free Energy Projective Simulation (FEPS): Active inference with interpretability" (Pazem et al., 2024)
"Free energy and inference in living systems" (Kim, 2022)
"Active Inference in Discrete State Spaces from First Principles" (Kenny, 25 Nov 2025)
"Deep Active Inference" (Ueltzhöffer, 2017)