Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayesian Online Changepoint Detection

Updated 20 April 2026
  • Bayesian Online Changepoint Detection is a recursive Bayesian algorithm that models run-lengths to identify sudden changes in data sequences.
  • It utilizes a message-passing recursion and sufficient statistics to update predictions efficiently, supporting various models like AR and state-space.
  • The approach is applied in domains such as finance and environmental monitoring, offering scalable and robust real-time change detection.

Bayesian Online Changepoint Detection (BOCPD) is a recursive, exact Bayesian algorithm for detecting abrupt changes in the generative parameters of a data sequence in real time. It tracks the posterior distribution of the “run length”—the number of observations since the most recent changepoint—providing robust uncertainty quantification and well-calibrated online predictions. BOCPD is highly modular, allowing flexible insertion of different predictive likelihoods and priors, and can be extended to various domains including time-series with temporal dependence, outliers, high-dimensionality, and collective anomalies (0710.3742).

1. Problem Formulation and Core Quantities

BOCPD addresses the detection of changepoints—abrupt variations in the underlying data-generating process—within a streaming data context. At each time tt with observed data x1:tx_{1:t}, the objective is to recursively compute the posterior distribution over the run length: P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots where rtr_t denotes the number of consecutive observations since the most recent changepoint; rt=0r_t = 0 implies a changepoint at tt. When P(rt=0x1:t)P(r_t=0\mid x_{1:t}) is high, an online changepoint is detected.

Key mechanisms:

  • Hazard function H(τ)H(\tau): Specifies the probability that the next observation after a run of length τ1\tau-1 will be a changepoint. It is related to the prior gap distribution PgapP_{\rm gap}:

x1:tx_{1:t}0

For a geometric prior of mean x1:tx_{1:t}1, x1:tx_{1:t}2.

  • Predictive distribution:

x1:tx_{1:t}3

This construction enables probabilistic detection of changes and forward-looking predictions (0710.3742).

2. Message-Passing Recursion and Sufficient Statistics

The core inference procedure uses a recursive message-passing algorithm over run-length hypotheses:

  • Growth step (x1:tx_{1:t}4): the run continues, no changepoint.
  • Changepoint step (x1:tx_{1:t}5): a changepoint occurs at x1:tx_{1:t}6.

The joint probability is updated as follows: x1:tx_{1:t}7

These steps reduce to two recursions:

  • Growth (no changepoint):

x1:tx_{1:t}8

  • Changepoint:

x1:tx_{1:t}9

The run-length posterior is normalized: P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots0

For exponential-family likelihoods with conjugate priors, only a fixed set of sufficient statistics needs to be maintained for each run-length hypothesis, enabling closed-form predictive updates and tractable per-step complexity (0710.3742).

3. Model Selection, Extensions, and Robustification

The modularity of BOCPD permits broad extensions:

  • Multimodel/changepoint detection: BOCPD supports a model universe P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots1, updating the joint run-length and model posterior recursively. Segment selection can be performed online for spatio-temporal VARs, Bayesian regression models, or mixtures (Knoblauch et al., 2018).
  • Non-i.i.d. and temporal/structured models: Autoregressive (AR) observation models (Tsaknaki et al., 2024), dynamic linear models, and Kalman filter–based sequential recursions for temporally correlated data (Li et al., 2023) generalize BOCPD beyond i.i.d. environments.
  • Robustness to outliers: Incorporating a mixture-outlier model, tracking candidates for the last outlier time and adjusting sufficient statistics, BOCPD can robustly detect changepoints without excessive FPs in the presence of anomalies (Wendelberger et al., 2021).
  • Generalized Bayesian inference: Diffusion score matching and general discrepancy-based posteriors provide theoretical robustness to misspecification and heavy-tailed contamination, maintaining tractability via conjugate-exponential families (Altamirano et al., 2023).

A summary of model classes and scalable update strategies is provided below:

Extension Key Methodological Change Reference
Autoregression AR(q) Regime-wise AR(q), time-varying var/corr, score-driven updates (Tsaknaki et al., 2024)
Kalman/State-Space Segment-wise DLM, closed Kalman updates, stitched for efficiency (Li et al., 2023)
Model Selection, VAR Run-length × Model recursion, VAR with spatial constraints (Knoblauch et al., 2018)
Outlier Robustification Joint outlier/run tracking, sufficient-statistic exclusion (Wendelberger et al., 2021)
Generalized Bayes Discrepancy-based posterior, score matching, robustness guarantees (Altamirano et al., 2023)

4. Computational Complexity and Practical Implementations

BOCPD’s per-time-step complexity is P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots2 due to the need to propagate all run-length hypotheses. In practical implementations, two strategies ensure scalability:

  • Run-length support pruning: Discard hypotheses for which the posterior falls below a threshold (e.g., P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots3), limiting the number of active run-lengths. Amortized complexity becomes P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots4 under geometric hazards (0710.3742).
  • Windowing/truncation: For long series, limit the maximum run considered ("windowing") to a manageable value (e.g., 100 observations), folding any tail mass to the maximum bin (Haug et al., 2022).

In high-dimensional or nonconjugate models, particle filters (Gong et al., 16 Sep 2025) or sequential variational methods (Detommaso et al., 2019) are integrated for parameter uncertainty representation and scalability, with resampling or optimal run-length selection for complexity control.

5. Empirical Performance and Applications

BOCPD and its extensions have been applied in domains including finance, biometrics, robotics, longitudinal health (Li et al., 2023), environmental monitoring, and more. Empirical findings include:

  • Fast, accurate detection of true changepoints (e.g., Gaussian mean shifts with near-unity posterior mass at P(rt=rx1:t),r=0,1,2,P(r_t = r \mid x_{1:t}), \quad r=0,1,2,\ldots5 immediately after change).
  • In temporally correlated settings (e.g., state-space or AR models), Kalman filter recursions and autoregressive BOCPD improve mean-squared error and regime covering relative to i.i.d. models (Li et al., 2023, Tsaknaki et al., 2024).
  • Joint online regression and change detection enables real-time monitoring for Earth observation (deforestation flagging with sub-day latency) (Wendelberger et al., 2021).
  • Robust and scalable generalizations (e.g., score-matching GB posteriors) suppress false alarms from outliers ("flash crashes") and enable order-of-magnitude computational gains over previous robust Bayesian methods (Altamirano et al., 2023).

6. Limitations, Assumptions, and Outlook

BOCPD assumes that the data-generating mechanism can be segmented into regimes, each (typically) exchangeable or Markovian, and that parameter dependence between regimes is either negligible or tractable. Exact inference is efficient in exponential-family + conjugate contexts, but models requiring nonconjugate likelihoods, unknown segment duration models, or collective anomaly interaction require approximation (e.g., variational, particle, or gradient-based schemes) (Gong et al., 16 Sep 2025, Chen et al., 8 Aug 2025).

Limitations include:

  • Inability to accommodate long-range autocorrelation natively unless extended with AR, DLM, or Gaussian process likelihoods.
  • For collective anomalies interleaved with true change-points, naive BOCPD may misidentify or merge the two events. Extensions with anomaly-tracking or reversion priors are required (Chen et al., 8 Aug 2025).
  • In streaming scenarios with high-frequency, high-dimensional signals, careful support pruning and model simplification are required to ensure real-time operation (Li et al., 2023).

Research directions include embedding BOCPD within complex model selection, handling semi-Markov and nonstationary processes, and integrating active learning or resource constraints for edge applications (Gundersen et al., 2021).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Online Changepoint Detection.