Bayesian Online Changepoint Detection
- Bayesian Online Changepoint Detection is a recursive Bayesian algorithm that models run-lengths to identify sudden changes in data sequences.
- It utilizes a message-passing recursion and sufficient statistics to update predictions efficiently, supporting various models like AR and state-space.
- The approach is applied in domains such as finance and environmental monitoring, offering scalable and robust real-time change detection.
Bayesian Online Changepoint Detection (BOCPD) is a recursive, exact Bayesian algorithm for detecting abrupt changes in the generative parameters of a data sequence in real time. It tracks the posterior distribution of the “run length”—the number of observations since the most recent changepoint—providing robust uncertainty quantification and well-calibrated online predictions. BOCPD is highly modular, allowing flexible insertion of different predictive likelihoods and priors, and can be extended to various domains including time-series with temporal dependence, outliers, high-dimensionality, and collective anomalies (0710.3742).
1. Problem Formulation and Core Quantities
BOCPD addresses the detection of changepoints—abrupt variations in the underlying data-generating process—within a streaming data context. At each time with observed data , the objective is to recursively compute the posterior distribution over the run length: where denotes the number of consecutive observations since the most recent changepoint; implies a changepoint at . When is high, an online changepoint is detected.
Key mechanisms:
- Hazard function : Specifies the probability that the next observation after a run of length will be a changepoint. It is related to the prior gap distribution :
0
For a geometric prior of mean 1, 2.
- Predictive distribution:
3
This construction enables probabilistic detection of changes and forward-looking predictions (0710.3742).
2. Message-Passing Recursion and Sufficient Statistics
The core inference procedure uses a recursive message-passing algorithm over run-length hypotheses:
- Growth step (4): the run continues, no changepoint.
- Changepoint step (5): a changepoint occurs at 6.
The joint probability is updated as follows: 7
These steps reduce to two recursions:
- Growth (no changepoint):
8
- Changepoint:
9
The run-length posterior is normalized: 0
For exponential-family likelihoods with conjugate priors, only a fixed set of sufficient statistics needs to be maintained for each run-length hypothesis, enabling closed-form predictive updates and tractable per-step complexity (0710.3742).
3. Model Selection, Extensions, and Robustification
The modularity of BOCPD permits broad extensions:
- Multimodel/changepoint detection: BOCPD supports a model universe 1, updating the joint run-length and model posterior recursively. Segment selection can be performed online for spatio-temporal VARs, Bayesian regression models, or mixtures (Knoblauch et al., 2018).
- Non-i.i.d. and temporal/structured models: Autoregressive (AR) observation models (Tsaknaki et al., 2024), dynamic linear models, and Kalman filter–based sequential recursions for temporally correlated data (Li et al., 2023) generalize BOCPD beyond i.i.d. environments.
- Robustness to outliers: Incorporating a mixture-outlier model, tracking candidates for the last outlier time and adjusting sufficient statistics, BOCPD can robustly detect changepoints without excessive FPs in the presence of anomalies (Wendelberger et al., 2021).
- Generalized Bayesian inference: Diffusion score matching and general discrepancy-based posteriors provide theoretical robustness to misspecification and heavy-tailed contamination, maintaining tractability via conjugate-exponential families (Altamirano et al., 2023).
A summary of model classes and scalable update strategies is provided below:
| Extension | Key Methodological Change | Reference |
|---|---|---|
| Autoregression AR(q) | Regime-wise AR(q), time-varying var/corr, score-driven updates | (Tsaknaki et al., 2024) |
| Kalman/State-Space | Segment-wise DLM, closed Kalman updates, stitched for efficiency | (Li et al., 2023) |
| Model Selection, VAR | Run-length × Model recursion, VAR with spatial constraints | (Knoblauch et al., 2018) |
| Outlier Robustification | Joint outlier/run tracking, sufficient-statistic exclusion | (Wendelberger et al., 2021) |
| Generalized Bayes | Discrepancy-based posterior, score matching, robustness guarantees | (Altamirano et al., 2023) |
4. Computational Complexity and Practical Implementations
BOCPD’s per-time-step complexity is 2 due to the need to propagate all run-length hypotheses. In practical implementations, two strategies ensure scalability:
- Run-length support pruning: Discard hypotheses for which the posterior falls below a threshold (e.g., 3), limiting the number of active run-lengths. Amortized complexity becomes 4 under geometric hazards (0710.3742).
- Windowing/truncation: For long series, limit the maximum run considered ("windowing") to a manageable value (e.g., 100 observations), folding any tail mass to the maximum bin (Haug et al., 2022).
In high-dimensional or nonconjugate models, particle filters (Gong et al., 16 Sep 2025) or sequential variational methods (Detommaso et al., 2019) are integrated for parameter uncertainty representation and scalability, with resampling or optimal run-length selection for complexity control.
5. Empirical Performance and Applications
BOCPD and its extensions have been applied in domains including finance, biometrics, robotics, longitudinal health (Li et al., 2023), environmental monitoring, and more. Empirical findings include:
- Fast, accurate detection of true changepoints (e.g., Gaussian mean shifts with near-unity posterior mass at 5 immediately after change).
- In temporally correlated settings (e.g., state-space or AR models), Kalman filter recursions and autoregressive BOCPD improve mean-squared error and regime covering relative to i.i.d. models (Li et al., 2023, Tsaknaki et al., 2024).
- Joint online regression and change detection enables real-time monitoring for Earth observation (deforestation flagging with sub-day latency) (Wendelberger et al., 2021).
- Robust and scalable generalizations (e.g., score-matching GB posteriors) suppress false alarms from outliers ("flash crashes") and enable order-of-magnitude computational gains over previous robust Bayesian methods (Altamirano et al., 2023).
6. Limitations, Assumptions, and Outlook
BOCPD assumes that the data-generating mechanism can be segmented into regimes, each (typically) exchangeable or Markovian, and that parameter dependence between regimes is either negligible or tractable. Exact inference is efficient in exponential-family + conjugate contexts, but models requiring nonconjugate likelihoods, unknown segment duration models, or collective anomaly interaction require approximation (e.g., variational, particle, or gradient-based schemes) (Gong et al., 16 Sep 2025, Chen et al., 8 Aug 2025).
Limitations include:
- Inability to accommodate long-range autocorrelation natively unless extended with AR, DLM, or Gaussian process likelihoods.
- For collective anomalies interleaved with true change-points, naive BOCPD may misidentify or merge the two events. Extensions with anomaly-tracking or reversion priors are required (Chen et al., 8 Aug 2025).
- In streaming scenarios with high-frequency, high-dimensional signals, careful support pruning and model simplification are required to ensure real-time operation (Li et al., 2023).
Research directions include embedding BOCPD within complex model selection, handling semi-Markov and nonstationary processes, and integrating active learning or resource constraints for edge applications (Gundersen et al., 2021).
References:
- Adams & MacKay, "Bayesian Online Changepoint Detection" (0710.3742)
- "Sequential Kalman filter for fast online changepoint detection" (Li et al., 2023)
- "Bayesian Autoregressive Online Change-Point Detection with Time-Varying Parameters" (Tsaknaki et al., 2024)
- "Monitoring Deforestation Using Multivariate Bayesian Online Changepoint Detection with Outliers" (Wendelberger et al., 2021)
- "Robust and Scalable Bayesian Online Changepoint Detection" (Altamirano et al., 2023)
- "Inferring Soil Drydown Behaviour with Adaptive Bayesian Online Changepoint Analysis" (Gong et al., 16 Sep 2025)
- "Bayesian online collective anomaly and change point detection in fine-grained time series" (Chen et al., 8 Aug 2025)
- "Active multi-fidelity Bayesian online changepoint detection" (Gundersen et al., 2021)
- "Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection" (Knoblauch et al., 2018)
- "Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks" (Detommaso et al., 2019)