Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 67 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 120 tok/s Pro

Kimi K2 166 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Online Change Point Detection

Updated 7 August 2025

Online change point detection is a real-time methodology that identifies abrupt changes in data sequences without using future observations.
It employs Bayesian techniques with latent run length and residual time variables to forecast imminent shifts and segment data.
Applications span finance, medicine, and industrial monitoring, leveraging advanced computational updates for effective prediction.

Online change point detection (OCPD) refers to the family of methodologies and algorithms that seek to identify abrupt changes in the underlying generative process of a data sequence as soon as possible, and crucially, in an online (real-time) fashion—that is, as new observations arrive. OCPD is distinguished from offline (retrospective) analysis by the requirement that decisions (or probabilistic inferences) be made without access to future data, making the detection of abrupt regime shifts in time series or high-dimensional streams both a computational and a statistical challenge with strong implications for applications across finance, medicine, industrial monitoring, and dynamic systems.

1. Core Principles and Latent Variable Modeling

At the heart of principled online change point detection lies explicit modeling of the generative dynamics of segmented data streams. The fundamental latent variable introduced in the Bayesian Online Change Point Detection (BOCPD) framework is the run length $r_t$ , denoting the time elapsed since the last change point, such that $r_t = 0$ signals a change point at time $t$ . The data sequence is decomposed into segments, each segment being generated from an observation model with parameters that are fixed within-segment but redrawn after each change.

The recursive updating of the run length posterior under the BOCPD paradigm is governed by the formula: $p(r_t, Y_{1:t}) = \sum_{r_{t-1}} p(y_t \mid r_t, Y^{(r_t)}) \cdot p(r_t \mid r_{t-1}) \cdot p(r_{t-1}, Y_{1:t-1})$ where $p(r_t \mid r_{t-1})$ is defined via a hazard function $H(\cdot)$ and $Y^{(r_t)}$ denotes the segment-specific data since the last change. Predictive inference is achieved by marginalizing over all possible run lengths.

The generalization towards joint inference over segment run length $r_t$ , segment total duration $d_t$ , and a (possibly discrete) segment or state index $z_t$ allows connection to Hidden semi-Markov models (HSMMs) and extensions to complex observation processes, including those exhibiting temporal scaling or discrete state switching.

2. Online Prediction of Future Change Points and Residual Time

A significant extension of BOCPD is residual time inference: beyond tracking how long since the last change, the method aims to probabilistically predict how many time steps until the next. The residual time variable $l_t$ represents the number of remaining observations in the current segment. Its posterior is given by marginalizing over the run length: $p(l_t \mid Y_{1:t}) = \sum_{r_t} p(l_t \mid r_t) p(r_t \mid Y_{1:t})$ where

$p(l_t \mid r_t) = H(r_t + l_t) \prod_{\gamma = r_t}^{r_t+l_t-1} [1 - H(\gamma)]$

For a constant hazard, this reduces to a geometric distribution. However, when the hazard or emission model is nontrivial (e.g., non-constant, or when the segment duration $d_t$ modulates emissions), the data influence predictions of future change points in a non-trivial manner. This enables online forecasting not just of recently occurred but of imminent regime changes, which is critical for applications requiring advance warning.

3. Duration-Dependent Observation Models and Temporal Scaling

The standard BOCPD formulation assumes the emission model $p(y_t \mid r_t, Y^{(r_t)})$ is invariant to total segment duration. For data displaying temporal scaling (for example, where different segments correspond to the same pattern at different speeds or durations), it is often necessary to model the emission likelihood as $p(y_t \mid r_t, d_t, z_t, Y^{(r_t)})$ , introducing dependence on the segment’s total length $d_t$ .

A typical example for phenomena with time-warping (such as ECG or synthetic signals with repeated patterns of varying durations) is: $y_t = b_k \sin\left(\frac{t}{d}\right) + c_k \sin\left(\frac{t}{d}\right) + \epsilon,\qquad t = 0, \ldots, d$ where $b_k, c_k$ are segment-specific amplitudes, but the temporal evolution is normalized by $d$ . The use of duration-dependent UPMs (Underlying Predictive Models) sharpens residual time inference, as early observations become informative about the total duration and thus the imminent arrival of a new change point.

4. Exact Inference and Computational Considerations

The joint posterior over $(r_t, d_t, z_t)$ is updated recursively: $\gamma_t = p(r_t, d_t, z_t, Y_{1:t}) = \sum_{r_{t-1}, d_{t-1}, z_{t-1}} p(y_t \mid r_t, d_t, z_t, Y^{(r_t)}) p(r_t, d_t, z_t \mid r_{t-1}, d_{t-1}, z_{t-1}) \gamma_{t-1}$ The transition kernel incorporates:

duration transitions: $p(d_t \mid z_t)$ on segment start, otherwise $d_t = d_{t-1}$
state transitions for $z_t$ as a Markov chain, updated only at change points
restart of the run length when the maximal allowed duration for the segment is reached

Optimal recursive updates exploit dynamic programming for efficiency, with complexity typically scaling as $\mathcal{O}(K^2 + D^2 K)$ (for $K$ states and maximum duration $D$ ). The method can become computationally demanding for large $K$ , $D$ , or complex UPMs.

5. Applications: Synthetic, Physiological, and Medical Data

The methodology supports a wide array of practical applications:

Synthetic HSMM Data: Utilizing sinusoidal emissions with explicit duration-dependence, the method delivers highly confident run length tracking, sharp posterior residual time estimates, and robust segmentation even under abrupt transitions.

Sleep Staging from EEG/EMG: Here, segment durations relate to physiological sleep cycles, but for computational reasons, the emission model disregards $d_t$ . This results in more conservative (less certain) predictions of residual time, as the observations do not inform directly about the segment’s scale, but the method still achieves online inference performance near state-of-the-art offline models.

ECG Cycle Segmentation: When the emission model is explicitly constructed as a basis expansion parameterized by normalized time $r_t/d_t$ , early segment observations facilitate rapid, low-uncertainty prediction of the next change point (e.g., systole/diastole transitions), outperforming duration-agnostic approaches.

6. Strengths, Limitations, and Open Problems

The extended BOCPD framework affords a unified, real-time approach to segmenting, forecasting, and characterizing unpredictable regime changes in time series. Its main capabilities include:

Simultaneous online inference of run length (past), residual time (future), and latent state (via duration- or segment-specific emission models)
Accommodation of complex, temporally scaled emission structures across segments
Integration with HSMMs for richer underlying state structures

However, several limitations are apparent:

In duration-agnostic models, predictions for the timing of future changes are inherently more uncertain and conservative, as real-time data does not reveal the segment’s scale
Computational cost escalates rapidly for large numbers of states or long-duration segments, necessitating trade-offs in implementation
Efficient updates for the residual time in duration-agnostic settings remain unresolved

These limitations suggest that further research is warranted on efficient approximations for segment duration inference, and on the identification of the minimal sufficient complexity for UPMs to balance computational and statistical performance across diverse application domains.

7. Summary Table: Key Latent Variables and Observation Models

Latent Variable	Description	Role in Inference
$r_t$	Run length (time since last change)	Past segmentation/posterior
$l_t$	Residual time (time to next change)	Future change prediction
$d_t$	Total segment duration	Enables scaling in emission model
$z_t$	Discrete latent state (e.g., HSMM hidden)	Multiple regimes/UPMs
Emission Model	$p(y_t \mid r_t, d_t, z_t, Y^{(r_t)})$	Duration/state-dependent likelihood

In summary, advanced online change point detection frameworks now provide not only retrospective segmentation but also forward-looking prediction of regime changes, leveraging duration-dependent and state-specific emission processes, and have demonstrated their utility in both controlled synthetic contexts and in challenging real-world applications such as sleep staging and ECG analysis (Agudelo-España et al., 2019).

PDF Markdown Chat (Pro)

References (1)

Bayesian Online Prediction of Change Points (2019)

Follow Topic

Get notified by email when new papers are published related to Online Change Point Detection.