Papers
Topics
Authors
Recent
Search
2000 character limit reached

Active Inference & Free-Energy Principle

Updated 8 February 2026
  • Active inference is a Bayesian framework that unifies perception and action by minimizing variational free energy, balancing exploration and exploitation.
  • It decomposes expected free energy into risk and ambiguity, providing a principled method for goal-directed planning and optimal control.
  • The approach bridges reinforcement learning and optimal control by transforming value functions into softmax policy selections through variational updates.

Active inference is a normative Bayesian framework for modeling agency, perception, and action, grounded in the Free-Energy Principle (FEP). The FEP asserts that any adaptive system minimizes a variational bound on sensory surprise, formalized as (expected) variational free energy, under an internal generative model of its environment. Active inference refines the FEP by explicitly encoding agent preferences in the form of prior distributions and extends the interpretation from passive perception to goal-directed action selection. This unification yields a theoretically principled account of the exploration–exploitation trade-off and provides a general recipe for control and reinforcement learning as variational inference.

1. Variational Free Energy: Foundations and Formulation

At the core of both the FEP and active inference is the variational free-energy functional, which serves as a tractable upper bound on surprise (negative log-evidence) for observations oo under the agent's generative model p(o,s,π)p(o, s, \pi), where ss are latent states and π\pi denotes policies (sequences of actions). Given a variational (recognition) density q(s,π)q(s,\pi), variational free energy is defined as

F[q(s,π)]=Eq(s,π)[lnq(s,π)lnp(o,s,π)]=KL[q(s,π)p(s,πo)]lnp(o)F[q(s,\pi)] = E_{q(s,\pi)}[\ln q(s,\pi) - \ln p(o, s, \pi)] = \mathrm{KL}[q(s,\pi) || p(s,\pi|o)] - \ln p(o)

Minimizing FF with respect to qq is formally equivalent to minimizing surprise lnp(o)-\ln p(o), as Flnp(o)F \geq -\ln p(o) by construction (Costa et al., 2024).

The variational free energy decomposes into:

  • Complexity: KL[q(s,π)p(s,π)]\mathrm{KL}[q(s,\pi) || p(s,\pi)], penalizing divergence from the agent's prior,
  • Accuracy: Eq(s,π)[lnp(os)]E_{q(s,\pi)}[\ln p(o|s)], rewarding beliefs consistent with the observed data.

This variational Bayesian framework subsumes classical inference and supplies an operational principle for both perception (belief updating) and action (policy selection) (Shin et al., 2021, McGregor et al., 2015).

2. Expected Free Energy and Its Decomposition

Active inference introduces the expected free energy (EFE) as the central prospective objective for action selection under future uncertainty: G(π):=Ep(s,oπ)[lnp(sπ)lnp(s,o)]G(\pi) := E_{p(s,o | \pi)} \big[ \ln p(s | \pi) - \ln p(s,o) \big] This policy-dependent functional admits a decomposition into two key terms:

  • Risk: KL[p(sπ)p(s)]\mathrm{KL}[p(s|\pi) \Vert p(s)], quantifying divergence of predicted state-trajectories under π\pi from prior preferences p(s)p(s) (exploitation),
  • Ambiguity: Ep(sπ)[H(p(os))]E_{p(s|\pi)}[H(p(o|s))], the expected entropy of observations conditioned on states (exploration).

Alternatively, EFE can be written as: G(π)=Ep(oπ)[lnp(o)]Ep(oπ)[KL(p(so,π)p(sπ))]G(\pi) = -E_{p(o|\pi)}[\ln p(o)] - E_{p(o|\pi)}[\mathrm{KL}(p(s|o,\pi)\Vert p(s|\pi))] Here, the first term is extrinsic value (log-likelihood of achieving preferred outcomes), and the second is intrinsic value (expected information gain about states) (Costa et al., 2024, Shin et al., 2021, Sajid et al., 2021). This decomposition integrates goal-directed and information-seeking behavior without ad hoc exploration bonuses.

3. Generative Models, Posterior Factorization, and Variational Updates

The generative model is structured as: p(o,s,π)=p(os)p(sπ)p(π)p(o, s, \pi) = p(o|s) p(s|\pi) p(\pi) The variational posterior adopts a mean-field factorization: q(s,π)=q(π)q(sπ)q(s, \pi) = q(\pi) q(s|\pi) Variational updates proceed by coordinate descent:

  • Perceptual update: For each π\pi, infer q(sπ)p(os)p(sπ)q(s|\pi) \propto p(o|s) p(s|\pi),
  • Planning update: q(π)p(π)exp{G(π)}q(\pi) \propto p(\pi) \exp\{-G(\pi)\}.

Thus, posterior beliefs over policies are "soft-maxed" in the (negative) expected free-energy landscape. This structure is consistent with sum-product and message-passing algorithms in factor graphs and admits both discrete-state (Laar et al., 2021, Sajid et al., 2019) and deep neural implementations (Ueltzhöffer, 2017, Mazzaglia et al., 2022).

4. Policy Selection via Minimization of Expected Free Energy

Active inference selects actions by scoring policy candidates according to their expected free energy:

  • For each candidate sequence π\pi, compute G(π)G(\pi),
  • Set q(π)p(π)exp[G(π)]q(\pi) \propto p(\pi) \exp[-G(\pi)],
  • Execute the first action of π=argminπG(π)\pi^* = \arg\min_\pi G(\pi).

Belief updates after each observation implement a receding-horizon planning scheme, where the policy is continually revised as new sensory data is assimilated (Costa et al., 2024, Shin et al., 2021).

5. Connections to Reinforcement Learning and Optimal Control

Active inference generalizes reinforcement learning (RL) by replacing value functions with functionals of Bayesian beliefs:

  • For deterministic-reward tasks with reward r(s,a)r(s,a) encoded as p(o=rs,a)exp[r(s,a)]p(o = r | s,a) \propto \exp[r(s,a)], and negligible ambiguity, G(π)Q(π)G(\pi) \simeq -Q(\pi), i.e., EFE becomes a negative value function (Costa et al., 2024, Shin et al., 2021).
  • Policy selection then coincides with soft-max action selection in entropy-regularized RL.

Additionally, the Bellman recursion for EFE mirrors the dynamic programming recursion for RL: G(st)=minaEp(st+1st,a)p(ot+1st+1)[lnp(st+1st,a)p~(ot+1)q(st+1ot+1)+G(st+1)]G^*(s_t) = \min_a E_{p(s_{t+1}|s_t, a) p(o_{t+1}|s_{t+1})} \left[ -\ln \frac{p(s_{t+1}|s_t, a)}{\tilde p(o_{t+1}) q(s_{t+1}|o_{t+1})} + G^*(s_{t+1}) \right] Within this formulation, epistemic (information-seeking) and instrumental (goal-reaching) components are additive in the policy objective (Kenny, 25 Nov 2025, Vries et al., 21 Apr 2025, Sajid et al., 2021, Sennesh et al., 2022).

6. Theoretical Implications: Agency, Exploration–Exploitation, and Universality

  • Agency and Preferences: By embedding preferences p(s)p(s) or p(o)p(o) explicitly in the generative model, active inference turns the FEP from a descriptive theory of self-organization into a prescriptive theory of agency—providing first-principles explanations of purposeful behavior (Costa et al., 2024, Shin et al., 2021).
  • Principled Exploration–Exploitation: The EFE decomposition inherently balances reward-seeking with epistemic value, resolving the exploration–exploitation dilemma without the need for externally specified bonuses or schedules. This yields goal-directed curiosity, as seen in T-maze and navigation simulations (Sajid et al., 2021, Laar et al., 2021).
  • Optimal Feedback Control: The infinite-horizon average-surprise variant of active inference recovers KL-control and path-integral formulations of optimal control. The EFE is structurally analogous to a control Lagrangian with preference costs, and the Bellman equation is a free energy variational optimization (Sennesh et al., 2022, Laar et al., 2019).
  • Neuroscience and Biophysical Plausibility: Active inference provides a process theory linking predictive coding, variational Bayes, and natural gradient descent in information space. Neural populations encode prediction errors as membrane potentials and expected states as firing rates, following free-energy gradients (Costa et al., 2020, Millidge, 2021, Kim, 2022).
  • Universality: Any RL algorithm satisfying the descriptive assumptions of active inference (finite horizon, reward map, model-based) can be recast in the active inference framework. This equivalence holds for both model-free and model-based regimes (Costa et al., 2024, Kenny, 25 Nov 2025).

7. Computational and Practical Aspects

Computational schemes for active inference leverage both classical variational mean-field updates and deep learning (VAE, amortized inference), supporting both discrete and continuous state spaces (Ueltzhöffer, 2017, Mazzaglia et al., 2022, Nazemi et al., 23 Mar 2025). Graphical model-based message-passing (Bethe, constrained Bethe, and Forney-style factor graphs) unify inference and control, with scalable algorithms for high-dimensional and partially observable environments (Koudahl et al., 2023, Laar et al., 2021).

Pragmatic implementations of active inference have been applied to:

8. Summary Table: Core Quantities in Active Inference

Quantity Formula Interpretation
Variational Free Energy FF Eq(s,π)[lnq(s,π)lnp(o,s,π)]E_{q(s,\pi)} [\ln q(s,\pi) - \ln p(o, s, \pi)] Bound on surprise; minimized for inference
Expected Free Energy GG Ep(s,oπ)[lnp(sπ)lnp(s,o)]E_{p(s,o|\pi)} [\ln p(s|\pi) - \ln p(s,o)] Policy-dependent value; balances risk/ambiguity
Risk (Exploitation) KL[p(sπ)p(s)]\mathrm{KL}[p(s|\pi)\|p(s)] Expected divergence from preferred states
Ambiguity (Exploration) Ep(sπ)[H(p(os))]E_{p(s|\pi)} [H(p(o|s))] Uncertainty in observations under a policy
Intrinsic Value Ep(oπ)[KL(p(so,π)p(sπ))]-E_{p(o|\pi)} [\mathrm{KL}(p(s|o,\pi)\|p(s|\pi))] Expected information gain about states
Policy Posterior q(π)p(π)exp[G(π)]q(\pi) \propto p(\pi) \exp[-G(\pi)] Softmax over negative expected free energy

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Active Inference and Free-Energy Principle.