Papers
Topics
Authors
Recent
Search
2000 character limit reached

BONG: Bayesian Online Natural Gradient

Updated 2 February 2026
  • BONG is a principled method for sequential Bayesian inference using online natural gradient steps equivalent to extended Kalman filtering.
  • It employs a variational Bayesian framework with single-step updates, enhancing uncertainty quantification and computational efficiency.
  • Practical approximations like diagonal and low-rank updates make BONG scalable for high-dimensional problems in Bayesian neural networks and vision-language models.

The Bayesian Online Natural Gradient (BONG) algorithm is a principled approach to sequential Bayesian inference that leverages the natural gradient—i.e., preconditioning by the Fisher Information Matrix (FIM)—to produce efficient, curvature-aware online updates for probabilistic models. BONG’s core insight is the theoretical equivalence between online natural gradient descent and extended Kalman filtering, enabling the integration of second-order information, rigorous uncertainty quantification, and scalable, deterministic approximations for high-dimensional problems such as Bayesian neural networks and fine-tuned vision-LLMs.

1. Theoretical Foundations and Equivalence to Kalman Filtering

BONG is grounded in the observation that natural gradient descent—a learning algorithm that scales parameter updates by the inverse Fisher information—admits an algebraic equivalence to the extended Kalman filter (EKF) in both the i.i.d. and recurrent settings. Specifically, estimating a parameter θRn\theta \in \mathbb{R}^n from a stream of observations via online natural gradient can be recast as applying an EKF to track a hidden state (the parameter vector) in light of new “pseudo-observations.” In this framework, the Fisher information accumulated from the data directly corresponds to the updated precision (inverse covariance) in the Kalman formalism (Ollivier, 2017, Abdi et al., 3 Nov 2025).

In the exponential family, the EKF’s information-filter step is: Pt1=Pt11+HtRt1Ht=Pt11+Ft,P_t^{-1} = P_{t-1}^{-1} + H_t^\top R_t^{-1} H_t = P_{t-1}^{-1} + F_t, with HtH_t the Jacobian of the output w.r.t. θ\theta and RtR_t the observation noise covariance. The parameter update takes the form: θt=θt1Ptθt,\theta_t = \theta_{t-1} - P_t \nabla_\theta \ell_t, which matches the natural gradient step θθηF1\theta \leftarrow \theta - \eta F^{-1} \nabla \ell when η\eta and other hyperparameters are aligned.

2. Variational Bayesian Formulation and Single-Step Update

In online variational Bayes, a mean-field or structured Gaussian variational posterior qλ(θ)q_\lambda(\theta) is recursively updated as new data arrive. Standard VB minimizes: Eqλ[logp(ytxt,θ)]+KL(qλ(θ)πtt1(θ)),-\mathbb{E}_{q_\lambda}[\log p(y_t|x_t,\theta)] + \mathrm{KL}(q_\lambda(\theta)\|\pi_{t|t-1}(\theta)), where πtt1\pi_{t|t-1} is the prior predictive. BONG simplifies this by performing a single natural gradient ascent step on the expected log-likelihood term, initialized at the prior parameter, and dropping the explicit KL regularizer. For an exponential family,

λt=λtt1+F(λtt1)1λEqtt1[logp(ytxt,θ)],\lambda_t = \lambda_{t|t-1} + F(\lambda_{t|t-1})^{-1} \nabla_\lambda \mathbb{E}_{q_{t|t-1}}[\log p(y_t|x_t,\theta)],

and in the mean parameterization,

μt=μtt1+μEqtt1[logp(ytxt,θ)].\mu_t = \mu_{t|t-1} + \nabla_\mu \mathbb{E}_{q_{t|t-1}}[\log p(y_t|x_t,\theta)].

BONG is thus a single-pass, one-step natural gradient mirror descent applied sequentially (Jones et al., 2024).

3. Exactness for Conjugate Exponential-Family Models

In the special case where the variational family and the likelihood form a conjugate exponential family, the BONG update precisely recovers the exact Bayesian posterior in one step. If

p(ytθ)exp(s(yt)T(θ)A(T(θ))),p(y_t|\theta) \propto \exp(s(y_t)^\top T(\theta) - A(T(\theta))),

with qtt1(θ)q_{t|t-1}(\theta) also exponential family, then the update is

ηt=ηtt1+ηEqtt1[s(yt)T(θ)A(T(θ))]=ηtt1+[s(yt);1],\eta_t = \eta_{t|t-1} + \nabla_\eta \mathbb{E}_{q_{t|t-1}} [s(y_t)^\top T(\theta) - A(T(\theta))] = \eta_{t|t-1} + [s(y_t); 1],

matching the canonical Bayesian filter.

4. Practical Approximations: Gaussian, Diagonal, and Low-Rank BONG

For non-conjugate problems, notably neural network parameterizations, full-covariance computations are infeasible. BONG employs approximations:

  • Diagonal approximation: Store and update only the diagonal elements of the covariance or Fisher, leading to O(n)\mathcal{O}(n) cost (Abdi et al., 3 Nov 2025, Mohan et al., 17 Nov 2025).
  • Diagonal plus low-rank (DLR): Represent the precision as Λ+WW\Lambda + W W^\top, where Λ\Lambda is diagonal and WW is n×rn \times r with rnr \ll n; the Fisher/precision updates and associated SVD projections allow memory and computation savings at a small cost in fidelity (Jones et al., 2024).
  • Monte Carlo and linearized EKF: Estimate the required moments E[logp],E[2logp]\mathbb{E}[\nabla \log p], \mathbb{E}[\nabla^2 \log p] via sampling or first-order Taylor approximations, respectively.

The BONG step for a Gaussian family is

mt=mtt1+Σtt1Eqtt1[θlogp(ytxt,θ)],m_t = m_{t|t-1} + \Sigma_{t|t-1} \mathbb{E}_{q_{t|t-1}} [\nabla_\theta \log p(y_t|x_t,\theta)],

Σt1=Σtt11Eqtt1[θ2logp(ytxt,θ)].\Sigma_t^{-1} = \Sigma_{t|t-1}^{-1} - \mathbb{E}_{q_{t|t-1}} [\nabla_\theta^2 \log p(y_t|x_t,\theta)].

For minibatch or Bayesian neural network training, these are computed efficiently via MC or diagonal/dlr approximations (Abdi et al., 3 Nov 2025, Mohan et al., 17 Nov 2025, Jones et al., 2024).

5. Iterative Natural Gradient Filtering in Nonlinear Systems

For highly nonlinear systems, the BONG approach can be extended to iterative, locally optimal natural gradient flows on the manifold of Gaussians. At each step, the update seeks stationary points of a variational objective combining expected loss and KL divergence to the prediction, leveraging the Fisher metrics of both mean and precision blocks. The resulting algorithm—termed NANO—proceeds by repeated natural-gradient steps on: J(x^,P)=EN(x^,P)[logp(yx)]+DKL(N(x^,P)N(x^tt1,Ptt1)),J(\hat x, P) = \mathbb{E}_{\mathcal{N}(\hat x, P)}[-\log p(y|x)] + D_{\mathrm{KL}}(\mathcal{N}(\hat x, P) \| \mathcal{N}(\hat x_{t|t-1}, P_{t|t-1})), with Fisher-based preconditioning on both mean and covariance (Cao et al., 2024). Specializing to the linear-Gaussian case, a single BONG step recovers EKF.

6. Computational Complexity and Scalability

BONG is explicitly designed for computational tractability in high-dimensional inference:

  • Full-covariance Kalman/natural gradient is O(n2)\mathcal{O}(n^2) per step, impractical for n106n \sim 10^6.
  • Diagonalization reduces cost and storage to O(n)\mathcal{O}(n); DLR maintains scalability for modest-rank updates.
  • Per-step complexity can match or marginally exceed first-order optimizers (SGD/Adam), making BONG suitable for online training of neural adapters or large language-vision models (Abdi et al., 3 Nov 2025, Mohan et al., 17 Nov 2025).

7. Uncertainty Quantification, Trust-Region Mechanisms, and Empirical Results

BONG’s maintenance of an updated approximate posterior q(θ)=N(m,P)q(\theta) = \mathcal{N}(m, P) endows each parameter with a calibrated uncertainty. Applications exploit this for Bayesian trust-region regularization, notably scaling updates by a Mahalanobis distance factor when the incoming data are out-of-distribution: λ=eαdM,\lambda = e^{-\alpha d_M}, resulting in enhanced OOD robustness. Experimental results on few-shot vision-language adaptation demonstrate consistent outperformance or parity with first-order baselines in both in-distribution and OOD tasks, with marked improvements under severe domain shift (e.g., +9pp OOD robustness on corrupted ImageNet-C injection) (Abdi et al., 3 Nov 2025). Analogous benefits for BNNs include improved calibration and accelerated convergence compared to variational inference with first-order optimizers (Mohan et al., 17 Nov 2025).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Online Natural Gradient (BONG).