Brain-Inspired Bayesian Inference

Updated 3 July 2026

Brain-inspired Bayesian inference is a family of computational frameworks that model perception and action as approximate Bayesian inference using neurally plausible mechanisms.
The framework employs variational free energy minimization and mean-field approximations to bypass the intractability of exact posterior inference.
Implementations range from predictive coding circuits and spiking neural networks to synaptic sampling schemes that represent uncertainty in neural computations.

Searching arXiv for recent and foundational papers on brain-inspired Bayesian inference, variational free energy, active inference, and neural/spiking implementations. arXiv search query: brain-inspired Bayesian inference variational free energy active inference spiking neural network Bayesian brain Brain-inspired Bayesian inference denotes a family of computational frameworks in which perception, learning, and sometimes action are formulated as approximate Bayesian inference performed by neurally plausible mechanisms. Across these frameworks, the brain is modeled as maintaining an internal probabilistic model of sensory data or latent causes, updating beliefs by minimizing variational free energy or equivalent discrepancies, and implementing these updates through local message passing, recurrent dynamics, synaptic plasticity, or stochastic sampling (Bazargani et al., 2023, Vafaii et al., 2024). The term covers several partially overlapping lines of work: generative-model inversion under the Free-Energy Principle, deterministic neural implementations of Bayes’ rule, synaptic-dropout and synaptic-sampling schemes for uncertainty representation, spiking-network realizations of message passing and posterior sampling, and observation-driven alternatives that relax explicit hidden-state inference (Kharratzadeh et al., 2015, McKee et al., 2021, Spaak, 2021).

1. Conceptual scope and defining assumptions

A central assumption in one major strand of the framework is that the brain entertains a generative model of the world and inverts it to infer hidden causes behind sensory stimuli. In the simplest static case, the joint density factorizes as

$p(o,s)=p(s)\,p(o\mid s),$

with hidden causes $s$ and observations $o$ . In a discrete-time state-space model over $T$ time steps,

$p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$

where $p(s_t\mid s_{t-1})$ is a transition kernel and $p(o_t\mid s_t)$ is an emission model (Bazargani et al., 2023). This formulation places perception in the class of latent-variable inference problems.

Within this formulation, exact posterior inference $p(s\mid o)\propto p(o,s)$ is typically intractable, motivating approximate inference through a variational posterior $q(s)$ and a scalar objective $F[q]$ , the variational free energy: $s$ 0 Equivalently, $s$ 1 (Bazargani et al., 2023). In this sense, free-energy minimization and ELBO maximization are formally equivalent descriptions of the same optimization principle, a point made explicit in spiking variational formulations such as the iterative Poisson VAE (Vafaii et al., 2024).

A distinct but related line of work does not begin from latent causes $s$ 2. Instead, it treats the brain’s primary task as tracking the probabilistic structure of observations themselves by fitting a model $s$ 3 to the empirical distribution $s$ 4, for instance by minimizing

$s$ 5

This “less-Bayes” formulation preserves probabilistic learning and regularization while avoiding explicit hidden-state postulation (Spaak, 2021). This suggests that “brain-inspired Bayesian inference” is not a single canonical formalism but a class of related probabilistic programs with different commitments about latent variables, representation, and mechanism.

2. Variational free energy, prediction error, and mean-field structure

In the generative-model tradition, variational free energy serves as the loss function the brain “minimises” (Bazargani et al., 2023). Under Gaussian-Laplace assumptions, it decomposes into a prediction error term and a complexity term: $s$ 6 where $s$ 7 is the sensory prediction error (Bazargani et al., 2023). The prediction-error component links these frameworks to predictive coding, while the complexity term enforces regularization relative to prior beliefs.

To make free-energy minimization tractable, one introduces a mean-field approximation (MFA), that is, a factorization of the variational posterior. Two canonical choices are: $s$ 8 for naïve mean field, and

$s$ 9

for structured Markov MFA (Bazargani et al., 2023). The former yields cheap updates at the expense of ignoring temporal dependencies; the latter preserves temporal structure at greater computational cost. In streaming settings, even reverse factorizations of the form

$o$ 0

have been proposed to reuse past sufficient statistics (Bazargani et al., 2023).

The same inferential structure appears in Active Inference formulations. In BRAIN, a discrete-time generative model

$o$ 1

is paired with a recognition density $o$ 2, and time-local free energy

$o$ 3

is decomposed into complexity minus accuracy (Basaran et al., 15 Feb 2026). Minimizing this quantity balances temporal consistency against explanatory adequacy.

A common misconception is that these formulations require exact Bayes-optimal posterior recovery. The literature instead emphasizes tractable approximation: structured or naïve MFA, natural-gradient descent, local recursions, or stochastic sampling all explicitly trade exactness for implementability (Bazargani et al., 2023, Vafaii et al., 2024, McKee et al., 2021). Another misconception is that predictive coding alone exhausts the field. Predictive coding is one neuro-mimetic circuit motif among several, not the only mechanism proposed (Bazargani et al., 2023).

3. Neural and circuit-level implementations

One biologically plausible realization is predictive coding. In hierarchical predictive-coding circuits, each level maintains state units $o$ 4 and error units $o$ 5; bottom-up signals carry prediction errors and top-down signals carry predictions $o$ 6. Local synaptic updates of the form

$o$ 7

approximate $o$ 8 (Bazargani et al., 2023). This implements free-energy minimization through local message passing and Hebbian-style plasticity.

A more explicitly spiking route derives inference dynamics from first principles under Poisson assumptions. In the iterative Poisson VAE, minimizing variational free energy by online natural-gradient descent yields recurrent spiking dynamics in which membrane potentials encode log-rates $o$ 9, spike counts $T$ 0 act as latent variables, and updates are driven by decoder residuals propagated through the Jacobian: $T$ 1 For linear decoders $T$ 2, the update reduces to a Locally Competitive Algorithm form,

$T$ 3

with emergent normalization via lateral competition (Vafaii et al., 2024). This suggests a direct correspondence between variational inference, recurrent inhibition, and spike-based representations.

Another spiking implementation targets posterior sampling rather than deterministic variational optimization. On BrainScaleS, conductance-based leaky integrate-and-fire neurons approximate sampling from a Boltzmann distribution

$T$ 4

with LIF dynamics implementing a Markov chain whose stationary distribution approximates the target posterior (Kungl et al., 2018). This line connects Bayesian inference to neuromorphic hardware and physical-model emulation.

A related earlier proposal shows that Linear-Nonlinear-Poisson networks can represent arbitrary binary Boltzmann machines and perform a “semi-stochastic” inference algorithm interpolating between Gibbs sampling and variational inference by smoothing recent spikes over a time window $T$ 5 (Shao, 2012). This suggests that the distinction between sampling and variational updates can be softened by time-scale and filtering choices.

The following table summarizes several implementation families present in the literature.

Framework family	Core mechanism	Representative source
Predictive coding / free-energy minimization	State units, error units, local prediction-error message passing	(Bazargani et al., 2023)
Poisson variational inference	Natural-gradient descent on $T$ 6 via membrane-potential dynamics	(Vafaii et al., 2024)
Synaptic or network sampling	Stochastic synapses or weights sample posterior uncertainty	(McKee et al., 2021, Kappel et al., 2015)
Spiking message passing	Bernoulli factor-graph messages realized by LIF neurons and STDP	(Adamiat et al., 19 Dec 2025)
Boltzmann-machine spiking samplers	LIF/LNP dynamics approximate posterior sampling	(Kungl et al., 2018, Shao, 2012)

These mechanisms differ in whether uncertainty is represented parametrically, by samples, or by activity frequencies, but all aim to realize probabilistic inference using operations compatible with neural circuits.

4. Learning rules, plasticity, and uncertainty representation

In Hidden Markov settings, variational updates can be written explicitly. For discrete latent states with variational parameters $T$ 7 such that $T$ 8, free-energy gradient recursions include a pairwise Gibbs-energy term

$T$ 9

a recursion for backward potentials $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 0, and filtering updates

$p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 1

(Bazargani et al., 2023). Model parameters $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 2 are then learned by differentiating the ELBO, yielding Hebbian-like updates from co-activation of pre- and postsynaptic neurons (Bazargani et al., 2023).

Deterministic neural-network implementations of probabilistic cognition pursue a different route. In the SDCC-based framework, a constructive feed-forward network learns probability distributions directly from binary event frequencies by minimizing

$p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 3

A theoretical result states that if the network output converges to the minimum of $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 4, then $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 5 as $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 6. Separate SDCC modules can learn priors $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 7 and likelihoods $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 8, after which a parallel max-product circuit computes MAP hypotheses, or an additional module learns normalized Bayes’ rule exactly (Kharratzadeh et al., 2015). This is a brain-inspired Bayesian framework without explicit variational free energy.

Uncertainty representation itself has been treated in multiple ways. In “Locally Learned Synaptic Dropout for Complete Bayesian Inference,” presynaptic release failures encode both epistemic and aleatoric uncertainty. With Bernoulli masks $p(s_{1:T},o_{1:T}\mid \theta) = p(s_1)\prod_{t=2}^T p(s_t\mid s_{t-1};\theta)\cdot\prod_{t=1}^T p(o_t\mid s_t;\theta),$ 9, sample-wise effective weights are $p(s_t\mid s_{t-1})$ 0, and a closed-form mapping

$p(s_t\mid s_{t-1})$ 1

is derived so that dropout plus winner-take-all sampling matches target conditional probabilities (McKee et al., 2021). A local delta rule then updates release probabilities using only local signals and competition outcomes (McKee et al., 2021).

An even broader plasticity account treats network parameters themselves as random variables sampled from a posterior. In synaptic sampling, one posits a prior $p(s_t\mid s_{t-1})$ 2, a network likelihood $p(s_t\mid s_{t-1})$ 3, and posterior

$p(s_t\mid s_{t-1})$ 4

The resulting stochastic differential equation

$p(s_t\mid s_{t-1})$ 5

with $p(s_t\mid s_{t-1})$ 6 has stationary distribution equal to the posterior for $p(s_t\mid s_{t-1})$ 7 (Kappel et al., 2015). This framework interprets synaptic stochasticity and spine motility as functionally necessary for posterior exploration, rather than as implementation noise.

A plausible implication is that “brain-inspired Bayesian inference” includes both deterministic approximation schemes and intrinsically stochastic ones. The literature does not converge on a single representational ontology for uncertainty; instead it offers multiple mechanisms—parametric posteriors, event-frequency estimation, release-failure sampling, and stochastic plasticity—under a shared Bayesian interpretation (Kharratzadeh et al., 2015, McKee et al., 2021, Kappel et al., 2015).

5. Message passing, action, and embodied inference

Some frameworks extend inference beyond perception to action. In Active Inference, the agent selects actions by minimizing expected free energy (EFE). In BRAIN, for candidate actions $p(s_t\mid s_{t-1})$ 8,

$p(s_t\mid s_{t-1})$ 9

where the first term is expected information gain and the second is expected risk under preferred outcomes $p(o_t\mid s_t)$ 0 (Basaran et al., 15 Feb 2026). Action is therefore epistemically and pragmatically motivated within the same variational formalism as perception.

This perception-action unification also appears in the observation-driven “less-Bayes” framework. There, experience is encoded as a regularizer $p(o_t\mid s_t)$ 1, for example $p(o_t\mid s_t)$ 2, and an energy over observations is defined as

$p(o_t\mid s_t)$ 3

Action then solves

$p(o_t\mid s_t)$ 4

or follows a gradient law

$p(o_t\mid s_t)$ 5

In this view, action is a form of constraint satisfaction that drives observations toward preferred regions of the learned observation landscape (Spaak, 2021).

The same general program has been applied to skill learning in brain-computer interfaces. In the motor-imagery BCI model, latent state $p(o_t\mid s_t)$ 6 factors into ERD strength and orientation, observations are discretized asymmetry feedback and left-ERD channels, and policy priors satisfy

$p(o_t\mid s_t)$ 7

Per-trial state and policy beliefs are updated by forward-backward message passing and expected-free-energy evaluation over candidate policies (Annicchiarico et al., 2024). By varying prior concentrations over transitions, the model reproduces different subject learning phenotypes, including “Expert” subjects, “Novice-lateralizers,” and “Mixed-experience” subjects (Annicchiarico et al., 2024).

At a finer algorithmic scale, message passing has also been implemented directly in spiking networks for Bernoulli random variables. In Forney-style factor graphs, outgoing Bernoulli messages obey the sum-product rule

$p(o_t\mid s_t)$ 8

and logical factors such as AND, OR, XOR, and equality are realized by small feed-forward SNNs trained by spike-timing-dependent plasticity (Adamiat et al., 19 Dec 2025). This is a distinct instantiation of Bayesian inference: rather than minimizing a global free-energy functional explicitly, it constructs local message-passing operators whose collective behavior approximates exact inference.

6. Empirical demonstrations, robustness, and limitations

Empirical studies span synthetic probability learning, neuromorphic sampling, wireless-network control, BCI learning, and factor-graph inference. In the SDCC framework, one-dimensional discrete and continuous probability-estimation tasks achieve correlations $p(o_t\mid s_t)$ 9 between learned outputs and ground-truth distributions using only 5–10 hidden units; in Gaussian tasks of dimension 1–4, network size grows from 6 to 16 units while maintaining $p(s\mid o)\propto p(o,s)$ 0; the max-product circuit returns the correct MAP hypothesis in 98% of trials on synthetic classification tasks (Kharratzadeh et al., 2015). This establishes that deterministic neural circuits can learn and use probabilities without explicit probability labels.

In the iP-VAE, empirical claims concern sparsity, reconstruction, and generalization. The model is reported to outperform both standard VAEs and Gaussian-based predictive coding models in sparsity, reconstruction, and biological plausibility, and to generalize strongly to out-of-distribution inputs, including rotated digits, EMNIST letters, Omniglot characters, and ImageNet32, exceeding hybrid iterative-amortized VAEs (Vafaii et al., 2024). Because the data block does not provide additional numerical metrics beyond parameter counts and qualitative comparisons, the significance of these results lies primarily in demonstrating a viable bridge from free-energy theory to spiking variational inference.

On BrainScaleS, Bayesian inference by spiking sampling shows real-world visual-data performance close to software RBMs despite hardware constraints. Reported results include rMNIST error $p(s\mid o)\propto p(o,s)$ 1 versus $p(s\mid o)\propto p(o,s)$ 2 for a reference RBM, rFMNIST error $p(s\mid o)\propto p(o,s)$ 3 versus $p(s\mid o)\propto p(o,s)$ 4 in software, per-image wall-clock $p(s\mid o)\propto p(o,s)$ 5– $p(s\mid o)\propto p(o,s)$ 6 ms yielding $p(s\mid o)\propto p(o,s)$ 7– $p(s\mid o)\propto p(o,s)$ 8 real-time acceleration, and median $p(s\mid o)\propto p(o,s)$ 9 for small-distribution benchmarks (Kungl et al., 2018). These results show that probabilistic spiking computation can remain functional under analog variability, 4-bit synaptic resolution, and device mismatch.

In STDP-based Bernoulli message passing, basic factors exhibit typical absolute error $q(s)$ 0, XOR outputs remain within $q(s)$ 1 of analytic values on test pairs, and a coding-theory example yields posterior estimates within $q(s)$ 2 of the exact sum-product solution (Adamiat et al., 19 Dec 2025). This demonstrates that biologically plausible synaptic plasticity can train small SNNs to approximate exact Bayesian factor updates.

In BRAIN, deployment on a private 5G testbed against a static heuristic and five DRL baselines yielded 30 % faster convergence to higher cumulative reward, 28.3 % greater robustness under a mid-experiment traffic shift, superior tail performance in per-slice CDFs, and belief updates in $q(s)$ 3 ms while evaluating $q(s)$ 4 candidate actions on telemetry streamed every 20 ms (Basaran et al., 15 Feb 2026). The framework also exposes posterior beliefs and EFE decomposition to operators through human-interpretable diagnostics (Basaran et al., 15 Feb 2026).

The literature also documents human-like deviations from normative Bayes. In the SDCC model, base-rate neglect emerges when prior-module weights are disrupted by an attention factor $q(s)$ 5, flattening priors toward uniformity and reducing MAP choice to likelihood maximization in the limit (Kharratzadeh et al., 2015). This challenges the misconception that brain-inspired Bayesian models necessarily predict flawless Bayesian behavior. Several frameworks explicitly aim to explain approximate, biased, or resource-constrained inference rather than idealized optimality (Kharratzadeh et al., 2015, Bazargani et al., 2023).

A recurring controversy concerns whether the brain literally infers hidden causes or whether explicit latent-state models are an unnecessary commitment. The classical generative-model program treats hidden states as indispensable for perception and action (Bazargani et al., 2023, Basaran et al., 15 Feb 2026), whereas the “Bayesian brain, with a bit less Bayes” argues that tracking $q(s)$ 6 without recourse to hidden states has substantial explanatory power, especially when priors are reinterpreted as regularization and action as constraint satisfaction (Spaak, 2021). This is not merely terminological; it concerns the correct level of abstraction for neurally plausible probabilistic computation.

Another unresolved issue is the representational status of neural variability. In synaptic-dropout and synaptic-sampling models, stochasticity is functional, encoding epistemic and aleatoric uncertainty or enabling posterior sampling over network configurations (McKee et al., 2021, Kappel et al., 2015). In deterministic frameworks such as predictive coding or SDCC-based Bayes modules, uncertainty is instead represented in parametric activity patterns or module outputs (Bazargani et al., 2023, Kharratzadeh et al., 2015). This suggests a deeper divide between optimization-based and sampling-based Bayesian brain models.

There is also a question of mechanistic granularity. Some frameworks are highly abstract, specifying generative models and free-energy objectives while leaving neural implementation partially schematic (Bazargani et al., 2023, Basaran et al., 15 Feb 2026). Others derive concrete update rules at the level of membrane potentials, release probabilities, or STDP windows (Vafaii et al., 2024, McKee et al., 2021, Adamiat et al., 19 Dec 2025). A plausible implication is that the field is progressively shifting from interpretive metaphors toward executable circuit-level proposals.

Finally, the framework has extended beyond perception to thermodynamic and embodied interpretations. In the thermodynamics of the Bayesian brain, posterior dynamics in recurrent spiking populations are analyzed through entropy, free energy, and an information-theoretic “neural engine,” with delayed feedback modulation linked to awareness and attention (Shimazaki, 2020). This suggests that free-energy formulations can be read not only as inferential objectives but also as dynamical and thermodynamic principles for neural systems.

Taken together, brain-inspired Bayesian inference is best understood as a research program rather than a single model. Its common denominator is the attempt to explain neural computation as probabilistic inference under biological constraints; its internal diversity lies in how it defines the target distribution, factorizes the posterior, represents uncertainty, couples perception to action, and realizes inference in circuits, spikes, or synapses (Bazargani et al., 2023, Vafaii et al., 2024, Spaak, 2021).