Papers
Topics
Authors
Recent
Search
2000 character limit reached

Greedy Online Change Point Detection

Updated 8 June 2026
  • GOCPD is a framework that greedily maximizes average log-likelihoods to detect abrupt distributional changes in streaming data.
  • It employs unimodal search strategies like ternary search and dynamic geometric grids to achieve O(log t) computational and storage efficiency.
  • The method integrates robust statistical tests and memory-reset mechanisms to control false alarms and minimize detection delays.

Greedy Online Change Point Detection (GOCPD) refers to a class of online, data-adaptive algorithms that detect abrupt distributional changes (“change points”) in streaming or sequentially observed data through localized, computationally efficient, typically likelihood-based criteria and grid-scanning or segmentation strategies. These procedures are characterized by the greedy maximization of evidence for a change—often by exploiting unimodality, strong log-likelihood contrasts, or CUSUM-like substructure—while maintaining low computational and storage costs suitable for high-throughput, high-dimensional, or ill-conditioned environments.

1. Formal Problem Setting and GOCPD Objective

The canonical GOCPD problem considers an infinite or streaming sequence of observations (possibly multivariate or structured), with an unknown change point τ\tau at which the underlying data-generating process transitions from a pre-change regime (with distribution P1P_1 or parameter θ1\theta_1) to a post-change regime (P2P_2 or θ2\theta_2). Denoting the observed data as D={Yt}t=1T\mathcal{D} = \{Y_t\}_{t=1}^T, the task is to design an online stopping rule τ^\widehat\tau such that, with high confidence and low delay,

  • τ^τ\widehat\tau \approx \tau whenever a change occurs, and
  • false alarms—detections before the true τ\tau or in the absence of change—are controlled at a nominal level.

In the “greedy” GOCPD paradigm, at each time tt, a candidate split point P1P_10 is identified by maximizing a criterion favoring segmentation into two independent regimes. For example, one widely used objective is

P1P_11

where P1P_12 denotes the average log-likelihood under maximum likelihood parameters fitted to respective segments, and P1P_13 is the last detected change point (Ho et al., 2023). This framework naturally subsumes classical CUSUM, likelihood-ratio, and residual-based tests, but is operationalized online through computationally efficient greedy search or grid-based scanning.

2. Algorithmic Structure and Computational Efficiency

GOCPD methods achieve efficiency and online deployment primarily through (a) greedy/local scans over a dynamically maintained set of candidate change points, (b) judicious use of summary statistics and recurrence relations, and (c) objective functions admitting (piecewise) unimodality, enabling accelerated search.

Greedy Search by Unimodality

For univariate or multivariate time series with a single change, the GOCPD objective is typically unimodal in the candidate index P1P_14—a formal property established in [(Ho et al., 2023), Proposition 1]. This property allows the change point search to be performed via ternary search, reducing per-step computational cost from P1P_15 to P1P_16.

Dynamic Geometric Grid

For large-scale or high-frequency settings, (Moen, 13 Apr 2025) proposes maintaining and updating a dynamically selected geometric grid P1P_17 of candidate offsets, with P1P_18, guaranteeing that for any true jump P1P_19, there exists a grid point close to θ1\theta_10. This enables grid-scanning CUSUM or likelihood-type tests to be performed in θ1\theta_11 time and space:

  • At each θ1\theta_12, all sufficient summaries or statistics for θ1\theta_13 are incrementally updated.
  • For each θ1\theta_14, a test statistic θ1\theta_15 is computed; detection is triggered if any θ1\theta_16 crosses a threshold.

Memory and Storage

For high-dimensional data, summaries (such as partial sums or outer products) are only maintained for θ1\theta_17 relevant intervals, and many GOCPD architectures exploit tail-length or excitationset sparsity to further reduce redundancy (Chen et al., 2020, Moen, 13 Apr 2025).

3. Test Statistics, Aggregation, and Robustification

GOCPD platforms support a wide variety of change detection statistics, including but not limited to:

  • Likelihood-ratio and average log-likelihood–based scores (Ho et al., 2023)
  • Multiscale, coordinate-wise CUSUM or CUSUM-like statistics and their cross-coordinate aggregations (for high-dimensions or signals of unknown sparsity) (Chen et al., 2020)
  • Covariance, operator-norm, or residual-based metrics for detecting parameter or structural breaks (Moen, 13 Apr 2025, Leung et al., 2024)

Robustification is achieved via post-split statistical confirmation, such as Mahalanobis screening of left- and right-segment residuals to guard against outlier-induced false positives. Empirically, these outlier guards reduce the false discovery rate (FDR) significantly in both synthetic and real-world benchmarks (Ho et al., 2023).

4. Theoretical Guarantees and Performance Metrics

GOCPD schemes are designed with explicit statistical and computational guarantees:

  • False Alarm Control: The probability of detecting a change before a true change (or in its absence), θ1\theta_18, is controlled at level θ1\theta_19 for user-specified P2P_20 (Moen, 13 Apr 2025, Chen et al., 2020).
  • Detection Delay: For sufficiently large change magnitude P2P_21 (or appropriate high-dimensional analogs), the expected delay P2P_22 is provably P2P_23, which matches information-theoretic minimax lower bounds up to logarithmic factors (Moen, 13 Apr 2025).
  • Computational Cost: Update and storage cost per time step is P2P_24 for scalar/low-dimensional tests, and P2P_25 or better for multivariate mean and covariance detection (Moen, 13 Apr 2025, Chen et al., 2020).
  • Empirical Performance: On real and synthetic datasets, GOCPD yields true positive rates (TPR) in the range P2P_26–P2P_27 and positive predictive values (PPV) P2P_28–P2P_29, outperforming or matching established baselines in FDR and runtime (Ho et al., 2023).

5. Extensions: Greedy Excitation in System Identification

In adaptive and system identification settings with time-varying parameters, GOCPD is paired with greedy excitation-set selection for robust recursive least squares (RLS) updates. The key innovation is to maintain an online "greedy excitation set" θ2\theta_20: newly acquired regressors are admitted if and only if they do not worsen the Hessian condition number (Leung et al., 2024). The parameter update at each step then uses a two-tier weighting—retaining informative past data and exponentially forgetting the rest—leading to improved tracking and bias-variance control.

An embedded GOCPD change point detector, based on EWMA-filtered model residuals and a likelihood-ratio test for jump-induced miss distributions, triggers a memory reset, discarding obsolete historical data and reinitializing the model post-jump for rapid reacquisition of new dynamics. This memory-resetting is provably optimal under likelihood tests and preserves adaptivity in ill-conditioned regimes.

6. Multiscale and High-Dimensional Frameworks

GOCPD architectures are designed to perform efficient detection in both low- and high-dimensional regimes. For high-dimensional Gaussian streams, multiscale likelihood-ratio and CUSUM-like statistics are computed over dyadic grids for each coordinate, yielding aggregation strategies (e.g., θ2\theta_21-hard-thresholded sums) that adapt to varying sparsity levels and unknown signal strengths (Chen et al., 2020, Moen, 13 Apr 2025). All core formulas—per-coordinate scan statistics, aggregation, and stopping rules—are updated greedily with minimal memory and are implemented in software such as the R package 'ocd'.

The general methodology allows any offline scan statistic (CUSUM, LR, operator-norm, etc.) to be dynamically embedded into a streaming GOCPD architecture, transforming classical global scans into locally greedy, real-time detectors.

7. Practical Implementation and Applications

Published implementations emphasize the following features:

  • Maintenance of summary statistics and candidate grids with θ2\theta_22 amortized time and space (Moen, 13 Apr 2025)
  • Flexible modeling choices (Gaussian, GP, or regression models) according to domain requirements (Ho et al., 2023, Leung et al., 2024)
  • Outlier-robust postprocessing and statistical threshold calibration via theoretical/empirical procedures
  • Empirically validated performance in applications ranging from high-frequency market data, EEG seizure detection, and activity monitoring to online adaptive regression in time-varying systems (Ho et al., 2023, Moen, 13 Apr 2025, Chen et al., 2020, Leung et al., 2024)

A tabular synopsis of GOCPD methodologies drawn from key references follows:

Methodology Source Core Idea Per-step Complexity
(Ho et al., 2023) Unimodal loglikelihood maximization + ternary search θ2\theta_23
(Moen, 13 Apr 2025) Dynamic geometric scan grid, any offline statistic θ2\theta_24–θ2\theta_25
(Chen et al., 2020) Multiscale, coordinatewise greedy CUSUM + aggregation θ2\theta_26
(Leung et al., 2024) Greedy excitation-set RLS + LR change detection θ2\theta_27

The empirical and theoretical analyses collectively establish GOCPD as an efficient, versatile, and statistically rigorous framework for rapid, robust change-point detection under streaming constraints and challenging data regimes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Greedy Online Change Point Detection (GOCPD).