Online Change Point Detection
- Online change point detection is a real-time methodology that identifies abrupt changes in data sequences without using future observations.
- It employs Bayesian techniques with latent run length and residual time variables to forecast imminent shifts and segment data.
- Applications span finance, medicine, and industrial monitoring, leveraging advanced computational updates for effective prediction.
Online change point detection (OCPD) refers to the family of methodologies and algorithms that seek to identify abrupt changes in the underlying generative process of a data sequence as soon as possible, and crucially, in an online (real-time) fashion—that is, as new observations arrive. OCPD is distinguished from offline (retrospective) analysis by the requirement that decisions (or probabilistic inferences) be made without access to future data, making the detection of abrupt regime shifts in time series or high-dimensional streams both a computational and a statistical challenge with strong implications for applications across finance, medicine, industrial monitoring, and dynamic systems.
1. Core Principles and Latent Variable Modeling
At the heart of principled online change point detection lies explicit modeling of the generative dynamics of segmented data streams. The fundamental latent variable introduced in the Bayesian Online Change Point Detection (BOCPD) framework is the run length , denoting the time elapsed since the last change point, such that signals a change point at time . The data sequence is decomposed into segments, each segment being generated from an observation model with parameters that are fixed within-segment but redrawn after each change.
The recursive updating of the run length posterior under the BOCPD paradigm is governed by the formula: where is defined via a hazard function and denotes the segment-specific data since the last change. Predictive inference is achieved by marginalizing over all possible run lengths.
The generalization towards joint inference over segment run length , segment total duration , and a (possibly discrete) segment or state index allows connection to Hidden semi-Markov models (HSMMs) and extensions to complex observation processes, including those exhibiting temporal scaling or discrete state switching.
2. Online Prediction of Future Change Points and Residual Time
A significant extension of BOCPD is residual time inference: beyond tracking how long since the last change, the method aims to probabilistically predict how many time steps until the next. The residual time variable represents the number of remaining observations in the current segment. Its posterior is given by marginalizing over the run length: where
For a constant hazard, this reduces to a geometric distribution. However, when the hazard or emission model is nontrivial (e.g., non-constant, or when the segment duration modulates emissions), the data influence predictions of future change points in a non-trivial manner. This enables online forecasting not just of recently occurred but of imminent regime changes, which is critical for applications requiring advance warning.
3. Duration-Dependent Observation Models and Temporal Scaling
The standard BOCPD formulation assumes the emission model is invariant to total segment duration. For data displaying temporal scaling (for example, where different segments correspond to the same pattern at different speeds or durations), it is often necessary to model the emission likelihood as , introducing dependence on the segment’s total length .
A typical example for phenomena with time-warping (such as ECG or synthetic signals with repeated patterns of varying durations) is: where are segment-specific amplitudes, but the temporal evolution is normalized by . The use of duration-dependent UPMs (Underlying Predictive Models) sharpens residual time inference, as early observations become informative about the total duration and thus the imminent arrival of a new change point.
4. Exact Inference and Computational Considerations
The joint posterior over is updated recursively: The transition kernel incorporates:
- duration transitions: on segment start, otherwise
- state transitions for as a Markov chain, updated only at change points
- restart of the run length when the maximal allowed duration for the segment is reached
Optimal recursive updates exploit dynamic programming for efficiency, with complexity typically scaling as (for states and maximum duration ). The method can become computationally demanding for large , , or complex UPMs.
5. Applications: Synthetic, Physiological, and Medical Data
The methodology supports a wide array of practical applications:
Synthetic HSMM Data: Utilizing sinusoidal emissions with explicit duration-dependence, the method delivers highly confident run length tracking, sharp posterior residual time estimates, and robust segmentation even under abrupt transitions.
Sleep Staging from EEG/EMG: Here, segment durations relate to physiological sleep cycles, but for computational reasons, the emission model disregards . This results in more conservative (less certain) predictions of residual time, as the observations do not inform directly about the segment’s scale, but the method still achieves online inference performance near state-of-the-art offline models.
ECG Cycle Segmentation: When the emission model is explicitly constructed as a basis expansion parameterized by normalized time , early segment observations facilitate rapid, low-uncertainty prediction of the next change point (e.g., systole/diastole transitions), outperforming duration-agnostic approaches.
6. Strengths, Limitations, and Open Problems
The extended BOCPD framework affords a unified, real-time approach to segmenting, forecasting, and characterizing unpredictable regime changes in time series. Its main capabilities include:
- Simultaneous online inference of run length (past), residual time (future), and latent state (via duration- or segment-specific emission models)
- Accommodation of complex, temporally scaled emission structures across segments
- Integration with HSMMs for richer underlying state structures
However, several limitations are apparent:
- In duration-agnostic models, predictions for the timing of future changes are inherently more uncertain and conservative, as real-time data does not reveal the segment’s scale
- Computational cost escalates rapidly for large numbers of states or long-duration segments, necessitating trade-offs in implementation
- Efficient updates for the residual time in duration-agnostic settings remain unresolved
These limitations suggest that further research is warranted on efficient approximations for segment duration inference, and on the identification of the minimal sufficient complexity for UPMs to balance computational and statistical performance across diverse application domains.
7. Summary Table: Key Latent Variables and Observation Models
Latent Variable | Description | Role in Inference |
---|---|---|
Run length (time since last change) | Past segmentation/posterior | |
Residual time (time to next change) | Future change prediction | |
Total segment duration | Enables scaling in emission model | |
Discrete latent state (e.g., HSMM hidden) | Multiple regimes/UPMs | |
Emission Model | Duration/state-dependent likelihood |
In summary, advanced online change point detection frameworks now provide not only retrospective segmentation but also forward-looking prediction of regime changes, leveraging duration-dependent and state-specific emission processes, and have demonstrated their utility in both controlled synthetic contexts and in challenging real-world applications such as sleep staging and ECG analysis (Agudelo-España et al., 2019).