Bayesian Neural Networks with Partial Stochasticity

Updated 2 July 2025

Bayesian Neural Networks with partial stochasticity are models that treat only select parameters as random variables to efficiently capture uncertainty.
They offer a practical balance between full stochasticity and deterministic approaches, enhancing calibration and predictive performance.
This selective approach reduces computational costs and memory requirements while retaining universal approximation capabilities.

Bayesian Neural Networks with Partial Stochasticity are a class of models that integrate Bayesian principles—quantifying uncertainty via probability distributions over model components—while applying stochasticity selectively to only parts of the neural network architecture. This approach embodies a practical and theoretically justified response to core challenges in Bayesian deep learning, namely scalability, robustness, computational cost, and the efficient quantification of predictive uncertainty.

1. Fundamental Principles and Conceptual Motivation

Partial stochasticity in the context of Bayesian Neural Networks (BNNs) refers to the practice of treating only a subset of network parameters as random variables subject to Bayesian inference, while the remainder are held fixed, i.e., deterministic. This contrasts with fully Bayesian neural networks, where all weights (and possibly biases or other parameters) are endowed with priors and inferred posteriors, resulting in high-dimensional, costly inference over the entire parameter space. In partial BNNs, stochasticity is targeted at specific architectural locations—such as specific layers, blocks, or parameter groups—based on empirical, theoretical, or inductive considerations. This approach is motivated by the observation that full stochasticity is often redundant and costly, and that selective injection of randomness can suffice for function expressivity, uncertainty quantification, and Bayesian regularization.

Recent research has demonstrated that networks with as few as $n$ stochastic biases (with $n$ the output dimension) are universal conditional distribution approximators, and that selective partial stochasticity matches or surpasses full stochasticity in both predictive quality and efficiency (Sharma et al., 2022).

2. Theoretical Foundations and Expressivity

Key theoretical results underpinning partial stochasticity include the Universal Conditional Distribution Approximation property. Specifically, any conditional distribution $p(y|x)$ (assuming a suitable continuous generator) can be represented as

$y = f(\eta, x), \qquad \eta \sim \mathcal{N}(0, I_m),$

for some deterministic function $f$ and $m \geq n$ . A neural network that includes only $m$ stochastic parameters (e.g., noise-injecting biases in a layer) can approximate $f$ arbitrarily well, provided sufficient width and proper deterministic mapping (Sharma et al., 2022, Calvo-Ordonez et al., 5 Feb 2024).

Accordingly, fully stochastic networks are not strictly necessary for universal function space coverage. Furthermore, excessive stochasticity can hinder practical expressivity and inferential efficiency, as unnecessary injected randomness may obscure meaningful posterior uncertainty or impede inference convergence. These findings are robust across a variety of common architectures (multilayer perceptrons, convolutional nets, and even deep residual networks) and extend to both regression and classification problems.

3. Methodological Realizations and Inference Schemes

Several mechanisms implement partial stochasticity in practice:

Stochastic Inputs or Early Layers: Introducing stochasticity via explicit noise vectors or by modeling only a subset of initial weights/projections as random variables (e.g., one or a few layers) (Gordon et al., 2017, Zhou et al., 2018, Zhao et al., 2023, Millard et al., 1 May 2025).
Partial Block/Bias Randomization: Using blockwise or neuron-wise Gaussian distributions for incoming weights (as in Restricted Bayesian Neural Networks), with only a small parameter set sampled per neuron or block (Ganguly et al., 6 Mar 2024).
Computation Skeletons: Decomposing network computations into deterministic and stochastic blocks, placing randomness only where interpretability or function space coverage requires it (Zhou et al., 2018).
Infinite-Depth Architectures: Partitioning stochasticity along the depth (vertical separation in time) or parameter (horizontal cut) axis, allowing only a segment of an ODE/SDE flow, or only a subset of weights, to be stochastic (Calvo-Ordonez et al., 5 Feb 2024).

Inference combines both non-parametric and variational strategies. For example, Sequential Monte Carlo (SMC) samplers lend themselves to accurate, scalable posterior approximation over partial stochastic subsets, especially using gradient-guided proposals and open-horizon schemes, as in recent advances (Zhao et al., 2023, Millard et al., 1 May 2025). Variational inference is facilitated when the number of stochastic parameters is low, leading to lighter computational footprints and better-calibrated posterior approximations (Chang, 2021, Ganguly et al., 6 Mar 2024). Structured partial stochasticity can also greatly simplify the posterior landscape by eliminating neuron permutation symmetries, which would otherwise manifest as factorially many redundant modes in the full Bayesian posterior (Rochussen, 27 May 2024).

4. Performance, Efficiency, and Comparative Analysis

Empirical evaluations across regression and classification (including UCI tasks, MNIST, CIFAR-10/100, and more elaborate OOD scenarios) indicate that partial-stochastic BNNs:

Match or exceed the predictive accuracy and calibration of fully Bayesian networks and deterministic baselines.
Often yield superior uncertainty quantification, particularly when stochasticity is introduced in early layers or upon critical parameter groups.
Achieve substantial reductions in memory and computational requirements—sometimes by orders of magnitude—relative to fully stochastic BNNs, due to the lower number of variational parameters, reduced sampling/fitting costs, and increased amenability to parallelization (Sharma et al., 2022, Calvo-Ordonez et al., 5 Feb 2024, Millard et al., 1 May 2025).

A further efficiency gain is realized in hardware, where stochastic inference may exploit nano-device-level noise (e.g., Phase Change Memory, PCM) to instantiate stochastic weight sampling without expensive randomness sources or area-inefficient storage; the separation of weight and noise planes in device arrays further accentuates the advantages (Katti et al., 2023, Katti et al., 12 Nov 2024).

5. Applications, Uncertainty Quantification, and Practical Implications

Applications naturally favoring partial stochasticity include:

Semi-supervised and active learning: Where predictive uncertainty steers label acquisition and model update priorities (Gordon et al., 2017).
Bayesian structure learning: Bayesian inference over neural network architecture, while treating weights deterministically or via selective post-hoc regularization, yields potent uncertainty-aware models with improved computational scaling (Deng et al., 2019).
Differential equations, dynamical systems, and scientific modeling: Hybrid treatment—coupling partial Bayesian NNs with partially known physical models, possibly regulated by PAC-Bayes bounds—offers enhanced generalization, stability, and interpretability, especially in scientific or engineering domains (Haussmann et al., 2020, Look et al., 2019, Calvo-Ordonez et al., 5 Feb 2024).

Robust uncertainty quantification (aleatoric and epistemic) is achievable without full stochasticity, provided stochasticity is correctly localized. This also holds for out-of-distribution detection, active exploration in RL, and calibration-critical classification tasks (Pearce et al., 2018, Sharma et al., 2022, Millard et al., 1 May 2025).

A further, recent implication is that partial stochasticity can be intentionally structured to destroy network parameter symmetries (notably neuron permutation), yielding a drastically simplified and more tractable posterior for approximate inference, with direct gains in RMSE, log likelihood, and calibration (Rochussen, 27 May 2024).

6. Methodological Limitations and Ongoing Research Directions

Challenges and open issues include:

Selection of stochastic subset: Identifying the optimal position and cardinality for stochastic parameters remains data and task dependent, with no universally optimal prescription.
Architecture generalization: Current universality and empirical results chiefly concern MLPs and CNNs; extension to attention-based mechanisms, GNNs, and highly structured graph architectures remains ongoing (Sharma et al., 2022).
Inference tightness: Even with partial stochasticity and advanced samplers, large networks and complex data continue to pose inference challenges—practically mixing the posterior, as in HMC or SMC, can prove arduous in high dimensions.

Directions for future research include automated or learnable stochastic subset selection, hybridization with functional Bayesian methods to bypass parameter-space prior pathologies (Wu et al., 25 Sep 2024), and more in-depth exploration of partial stochasticity for large-scale, hardware-accelerated, or resource-constrained deployments.

7. Summary Table: Comparative Properties

Method/Aspect	Full Bayesian NN	Partial Stochasticity BNN	Deterministic NN
Stochastic params	All	Subset (layer/group/struct)	None
Posterior Dimensionality	Maximum	Reduced	N/A
Uncertainty Quantification	Explicit (costly)	Explicit (efficient)	Limited
Memory/Compute	High	Low/Moderate	Low
Expressivity	Universal (theoretical)	Universal (for $n$ outputs)	Universal (mean)
Practical Calibration	Varies	Often optimal	Often poor
Posterior Symmetries	Factorial modes	Finitely/singly modal	N/A

References

Sharma, Daxberger et al., "Do Bayesian Neural Networks Need To Be Fully Stochastic?" (Sharma et al., 2022)
Forrow, "Structured Partial Stochasticity in Bayesian Neural Networks" (Rochussen, 27 May 2024)
Sergio20f et al., "Partially Stochastic Infinitely Deep Bayesian Neural Networks" (Calvo-Ordonez et al., 5 Feb 2024)
Nardi et al., "On Feynman–Kac training of partial Bayesian neural networks" (Zhao et al., 2023)
Somepalli et al., "Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks" (Katti et al., 12 Nov 2024)
Various, as detailed in preceding analysis.

Bayesian neural networks with partial stochasticity thus represent a theoretically justified and empirically validated paradigm for scalable, robust, and efficient uncertainty quantification in neural modeling, with ongoing developments in inference, architecture, and application domains.