Online Sequential Bayesian Updating
- Online sequential Bayesian updating is a recursive inference method that incrementally refines posterior distributions as new data arrives.
- It employs exact conjugate updates, variational approximations, and particle filtering to handle diverse models under streaming and dynamic conditions.
- The approach is vital for real-time forecasting, anomaly detection, and continual learning, ensuring computational efficiency in high-frequency data regimes.
Online Sequential Bayesian Updating is a family of methodologies for recursive statistical inference, wherein the posterior distribution is updated incrementally as new observations or data batches arrive, leveraging the previous posterior as the new prior. This paradigm underpins a substantial portion of the modern Bayesian literature on inference under streaming, high-frequency, or distributed data regimes, and finds theoretical justification as the natural operationalization of Bayes’ theorem in the presence of sequential or temporally indexed data streams. The approach is applicable across parametric, nonparametric, latent-variable, and dynamic state-space models; can be realized exactly or with approximations; and aligns with loss-based generalizations (e.g., Gibbs posteriors in PAC-Bayes) for broader online learning and decision-theoretic settings.
1. Theorem of Recursive Bayesian Updating
At its core, online Bayesian updating is described by the recursive formula
where is the posterior after the first blocks or points, is the likelihood for the new block or observation, and is the updated posterior. In canonical streaming applications, observations may arrive singly, or in mini-batches , and the update proceeds using only prior sufficient statistics and the new data—never the full history (Hooten et al., 2018, Lee et al., 8 Apr 2025, Kamariotis et al., 2021). In dynamic models (e.g., state-space, filtering), latent variables (e.g., in HMMs) are also handled recursively using model-specific marginalization.
Several formulations exist:
- Sequential/recursive Bayes for static parameters: posterior at step depends only on the previous posterior and immediate likelihood (Hooten et al., 2018, Lee et al., 8 Apr 2025).
- State-space/hidden Markov models: Recursive filtering equations (e.g., Kalman, particle filters) propagate latent-state in addition to parameter posteriors (Duran-Martin, 12 May 2025, Dinh et al., 2016, Aktekin et al., 2016).
Thus, sequential online updating is not confined to regression or i.i.d. settings but unifies Bayesian filtering, latent-variable inference, and recursive structure learning under a common framework.
2. Exact Bayesian Filtering and Conjugate Cases
In models where prior and likelihood are conjugate (e.g., normal–normal, exponential-family–conjugate pairs), each update is analytic, and the sufficient statistics (moments, counts, etc.) can be incrementally maintained. This enables or batch-size-complexity online inference (Lee et al., 8 Apr 2025, Dinh et al., 2016, Romeres et al., 2016, Aktekin et al., 2016). For example:
- Kalman filter: Exact Gaussian update of mean and covariance 0 per observation, applicable to Bayesian neural network weights under linear–Gaussian likelihoods (Wagner et al., 2021, Duran-Martin, 12 May 2025, Romeres et al., 2016).
- Bayesian model selection: Conjugate priors enable variable inclusion and marginal-likelihood updating with Laplace, BIC, or renewable-summary approximations (Ghosal et al., 19 Jan 2025).
- Dynamic models: Analytic updates of filtering distributions for state and static parameters (e.g., Gamma–Poisson for count models) using sufficient statistics (Aktekin et al., 2016).
This leads to algorithms that maintain only low-dimensional summaries and do not re-access full data, suitable for high-velocity or memory-limited streaming applications.
3. Approximate and Variational Methods
When conjugacy or analytic tractability is absent, approximate inference methods enable online sequential Bayesian updating.
Variational Bayes (VB) and Extensions
- Online Variational Bayes: Given approximating family 1, each update targets the pseudo-posterior
2
using the new data's likelihood and previous VB approximation as prior. One minimizes 3 via stochastic gradient ascent, typically implementing updates over only the new data and thereby reducing per-step computational burden to 4 (Tomasetti et al., 2019, Lee et al., 8 Apr 2025, Kochurov et al., 2018).
- Streaming Variational Inference (ELBO):
5
where the prior for the ELBO at 6 is the previous approximate posterior (Kochurov et al., 2018, Tomasetti et al., 2019).
- Online Bernstein–von Mises: Under mild smoothness and batch-size-to-dimension scaling (7), the composition of Gaussian approximations at each update retains frequentist validity and is asymptotically equivalent (in total variation) to the batch posterior (Lee et al., 8 Apr 2025).
- Importance-sampling-based updates (UVB-IS): Various strategies reuse samples from the prior q, weighting for new likelihood contributions to further accelerate updates at minimal loss in accuracy (Tomasetti et al., 2019).
Sequential Monte Carlo (SMC) and Particle Methods
- Particle Filter / SMC: Particles represent current posterior ensemble 8, updated via
9
with periodic resampling (when effective sample size degrades) and often followed by rejuvenation moves (e.g., MCMC, kernel smoothing) to avoid particle impoverishment (Menictas et al., 2023, Dinh et al., 2016, Xie et al., 25 Nov 2025).
- Online SMC for latent structure: Latent models (e.g., state-space, changepoint detection) admit SMC-based filtering, with weights given by predictive likelihoods and per-step cost controlled by particle count and sufficient-statistics management (Dinh et al., 2016, 0710.3742, Menictas et al., 2023).
Generalisations and Robustified Updates
- Gibbs/Generalized Posteriors: Online updating via pseudo-likelihoods or general loss functions (e.g., exponentiated regret, adversarial tasks) produces Gibbs posteriors and is key for regret minimization and PAC-Bayes-motivated online learning; see, e.g.,
0
with SMC sampling and theoretical O(√T) regret bounds for bounded, mixable losses (Xie et al., 25 Nov 2025, Wu et al., 2024, Duran-Martin, 12 May 2025).
- Robust Bayesian filters: Loss-adapted or weighted updates (using, e.g., Mahalanobis or robust loss weighting) maintain sequential updating under outlier or model-misspecification regimes, sometimes preserving Kalman filter form (Duran-Martin, 12 May 2025).
4. Non-Stationarity, Adaptivity, and Memory Design
Classical recursive Bayes presumes static parameters. Extensions to non-stationary, drift, or changepoint regimes require memory or model-adaptive mechanisms.
- Forgetting/Adaptive Memory: Mechanisms downweight or selectively recall past data to facilitate adaptation to regime switches, recurring environments, or non-stationarity. BAM (Bayes with Adaptive Memory) introduces a greedy (approximate) optimization of which past datapoints to remember, generalizing fixed forgetting, sliding windows, power priors, and unlearning as special cases (Nassar et al., 2022).
- Runlength- or changepoint-aware priors: Models such as Bayesian online changepoint detection (0710.3742) or adaptive filtering (Duran-Martin, 12 May 2025) parameterize priors/updates by current runlength or environmental state, enabling immediate (and uncertainty-aware) learning upon regime switches.
- Drift and covariance inflation: Online filters inject artificial dynamics or rescale prior covariance ensuring posterior readiness for shifts without overconfidence accumulation (Duran-Martin, 12 May 2025, Duran-Martin et al., 13 Jun 2025).
5. Algorithmic and Computational Aspects
Efficient online Bayesian inference requires control of storage, compute, and approximation complexity.
- Sufficient statistics storage: For exponential-family likelihoods and Gaussian models, summary statistics (e.g., sums, empirical covariances) are maintained and updated in O(parameter dimension2) per step (Ghosal et al., 19 Jan 2025, Duran-Martin, 12 May 2025, Menictas et al., 2023).
- Particle filters and SMC: Per-step cost is O(particle count), scalability controlled by bounding the variance of incremental weights, with stability guaranteed by resampling and theory for lower bounds on effective sample size even as model dimensions increase (Dinh et al., 2016, Menictas et al., 2023).
- Kalman updates and block structure: For high-dimensional models such as deep Bayesian neural networks, blockwise or low-rank updates for groups of weights allow propagation of posterior uncertainty and reduction of computational cost, while retaining well-defined predictive distributions (Duran-Martin et al., 13 Jun 2025, Wagner et al., 2021).
- Batch-to-online translation: Many MCMC/VI pipelines are emulable in online/recursive form via updating with each batch's log-likelihood, using previous approximated posterior as the new prior without refitting the model on all data (Hooten et al., 2018, Lee et al., 8 Apr 2025, Tomasetti et al., 2019).
6. Guarantees, Theory, and Empirical Findings
- Bernstein–von Mises for Online VB: Provided mini-batch size exceeds a critical threshold depending on parameter dimension and number of steps, sequential variational updates deliver asymptotically normal posteriors that are indistinguishable from full-batch posteriors in total variation distance (Lee et al., 8 Apr 2025).
- O(√T) Regret for Bayesian Online Optimization: Gibbs-posterior-based online Bayesian updating yields O(√T) regret bounds for contextual optimization and learning with bounded and mixable losses (Xie et al., 25 Nov 2025, Wu et al., 2024).
- Consistency and Efficiency of Particle Methods: For online SMC/particle learning in growing-dimensional or phylogenetic models, stability and consistency are guaranteed as particle count increases, with effective sample size growing linearly and no exponential degeneracy (Dinh et al., 2016, Aktekin et al., 2016).
- Avoidance of Catastrophic Forgetting: In deep learning, retaining the previous approximate posterior as the new prior suppresses catastrophic forgetting compared to naïve fine-tuning, as empirically confirmed for neural networks on sequential tasks (Kochurov et al., 2018).
- Limitations and Trade-offs: Fully online (per data-point) VB or SMC may accumulate approximation error unless batch sizes scale appropriately, and SMC for non-Gaussian likelihoods imposes higher per-update costs due to full-data revisit (Tomasetti et al., 2019, Menictas et al., 2023). Memory selection in adaptive-memory filters is NP-hard, often mitigated via heuristics (Nassar et al., 2022). Complexity control and tuning are thus critical for reliable and scalable deployment.
7. Applications and Broader Contexts
Online sequential Bayesian updating is foundational in domains requiring continual, instantaneous, or memory-limited inference:
- Real-time forecasting (sensor streaming, finance)
- System identification and structural health monitoring (engineering, (Kamariotis et al., 2021, Romeres et al., 2016))
- Deep continual learning and domain adaptation (Kochurov et al., 2018, Duran-Martin et al., 13 Jun 2025)
- High-frequency anomaly/changepoint detection (0710.3742, Duran-Martin, 12 May 2025, Nassar et al., 2022)
- Distributed/partitioned Bayesian inference (Hooten et al., 2018)
- Nonparametric and semiparametric regression (Menictas et al., 2023)
- Multi-armed bandits and sequential decision (Xie et al., 25 Nov 2025, Duran-Martin et al., 13 Jun 2025, Nassar et al., 2022)
Its integration with robust, adaptive, and scalable methodologies continues to drive advances in model-based learning, scalable inference, and statistical decision-theoretic frameworks.