Greedy Online Change Point Detection
- GOCPD is a framework that greedily maximizes average log-likelihoods to detect abrupt distributional changes in streaming data.
- It employs unimodal search strategies like ternary search and dynamic geometric grids to achieve O(log t) computational and storage efficiency.
- The method integrates robust statistical tests and memory-reset mechanisms to control false alarms and minimize detection delays.
Greedy Online Change Point Detection (GOCPD) refers to a class of online, data-adaptive algorithms that detect abrupt distributional changes (“change points”) in streaming or sequentially observed data through localized, computationally efficient, typically likelihood-based criteria and grid-scanning or segmentation strategies. These procedures are characterized by the greedy maximization of evidence for a change—often by exploiting unimodality, strong log-likelihood contrasts, or CUSUM-like substructure—while maintaining low computational and storage costs suitable for high-throughput, high-dimensional, or ill-conditioned environments.
1. Formal Problem Setting and GOCPD Objective
The canonical GOCPD problem considers an infinite or streaming sequence of observations (possibly multivariate or structured), with an unknown change point at which the underlying data-generating process transitions from a pre-change regime (with distribution or parameter ) to a post-change regime ( or ). Denoting the observed data as , the task is to design an online stopping rule such that, with high confidence and low delay,
- whenever a change occurs, and
- false alarms—detections before the true or in the absence of change—are controlled at a nominal level.
In the “greedy” GOCPD paradigm, at each time , a candidate split point 0 is identified by maximizing a criterion favoring segmentation into two independent regimes. For example, one widely used objective is
1
where 2 denotes the average log-likelihood under maximum likelihood parameters fitted to respective segments, and 3 is the last detected change point (Ho et al., 2023). This framework naturally subsumes classical CUSUM, likelihood-ratio, and residual-based tests, but is operationalized online through computationally efficient greedy search or grid-based scanning.
2. Algorithmic Structure and Computational Efficiency
GOCPD methods achieve efficiency and online deployment primarily through (a) greedy/local scans over a dynamically maintained set of candidate change points, (b) judicious use of summary statistics and recurrence relations, and (c) objective functions admitting (piecewise) unimodality, enabling accelerated search.
Greedy Search by Unimodality
For univariate or multivariate time series with a single change, the GOCPD objective is typically unimodal in the candidate index 4—a formal property established in [(Ho et al., 2023), Proposition 1]. This property allows the change point search to be performed via ternary search, reducing per-step computational cost from 5 to 6.
Dynamic Geometric Grid
For large-scale or high-frequency settings, (Moen, 13 Apr 2025) proposes maintaining and updating a dynamically selected geometric grid 7 of candidate offsets, with 8, guaranteeing that for any true jump 9, there exists a grid point close to 0. This enables grid-scanning CUSUM or likelihood-type tests to be performed in 1 time and space:
- At each 2, all sufficient summaries or statistics for 3 are incrementally updated.
- For each 4, a test statistic 5 is computed; detection is triggered if any 6 crosses a threshold.
Memory and Storage
For high-dimensional data, summaries (such as partial sums or outer products) are only maintained for 7 relevant intervals, and many GOCPD architectures exploit tail-length or excitationset sparsity to further reduce redundancy (Chen et al., 2020, Moen, 13 Apr 2025).
3. Test Statistics, Aggregation, and Robustification
GOCPD platforms support a wide variety of change detection statistics, including but not limited to:
- Likelihood-ratio and average log-likelihood–based scores (Ho et al., 2023)
- Multiscale, coordinate-wise CUSUM or CUSUM-like statistics and their cross-coordinate aggregations (for high-dimensions or signals of unknown sparsity) (Chen et al., 2020)
- Covariance, operator-norm, or residual-based metrics for detecting parameter or structural breaks (Moen, 13 Apr 2025, Leung et al., 2024)
Robustification is achieved via post-split statistical confirmation, such as Mahalanobis screening of left- and right-segment residuals to guard against outlier-induced false positives. Empirically, these outlier guards reduce the false discovery rate (FDR) significantly in both synthetic and real-world benchmarks (Ho et al., 2023).
4. Theoretical Guarantees and Performance Metrics
GOCPD schemes are designed with explicit statistical and computational guarantees:
- False Alarm Control: The probability of detecting a change before a true change (or in its absence), 8, is controlled at level 9 for user-specified 0 (Moen, 13 Apr 2025, Chen et al., 2020).
- Detection Delay: For sufficiently large change magnitude 1 (or appropriate high-dimensional analogs), the expected delay 2 is provably 3, which matches information-theoretic minimax lower bounds up to logarithmic factors (Moen, 13 Apr 2025).
- Computational Cost: Update and storage cost per time step is 4 for scalar/low-dimensional tests, and 5 or better for multivariate mean and covariance detection (Moen, 13 Apr 2025, Chen et al., 2020).
- Empirical Performance: On real and synthetic datasets, GOCPD yields true positive rates (TPR) in the range 6–7 and positive predictive values (PPV) 8–9, outperforming or matching established baselines in FDR and runtime (Ho et al., 2023).
5. Extensions: Greedy Excitation in System Identification
In adaptive and system identification settings with time-varying parameters, GOCPD is paired with greedy excitation-set selection for robust recursive least squares (RLS) updates. The key innovation is to maintain an online "greedy excitation set" 0: newly acquired regressors are admitted if and only if they do not worsen the Hessian condition number (Leung et al., 2024). The parameter update at each step then uses a two-tier weighting—retaining informative past data and exponentially forgetting the rest—leading to improved tracking and bias-variance control.
An embedded GOCPD change point detector, based on EWMA-filtered model residuals and a likelihood-ratio test for jump-induced miss distributions, triggers a memory reset, discarding obsolete historical data and reinitializing the model post-jump for rapid reacquisition of new dynamics. This memory-resetting is provably optimal under likelihood tests and preserves adaptivity in ill-conditioned regimes.
6. Multiscale and High-Dimensional Frameworks
GOCPD architectures are designed to perform efficient detection in both low- and high-dimensional regimes. For high-dimensional Gaussian streams, multiscale likelihood-ratio and CUSUM-like statistics are computed over dyadic grids for each coordinate, yielding aggregation strategies (e.g., 1-hard-thresholded sums) that adapt to varying sparsity levels and unknown signal strengths (Chen et al., 2020, Moen, 13 Apr 2025). All core formulas—per-coordinate scan statistics, aggregation, and stopping rules—are updated greedily with minimal memory and are implemented in software such as the R package 'ocd'.
The general methodology allows any offline scan statistic (CUSUM, LR, operator-norm, etc.) to be dynamically embedded into a streaming GOCPD architecture, transforming classical global scans into locally greedy, real-time detectors.
7. Practical Implementation and Applications
Published implementations emphasize the following features:
- Maintenance of summary statistics and candidate grids with 2 amortized time and space (Moen, 13 Apr 2025)
- Flexible modeling choices (Gaussian, GP, or regression models) according to domain requirements (Ho et al., 2023, Leung et al., 2024)
- Outlier-robust postprocessing and statistical threshold calibration via theoretical/empirical procedures
- Empirically validated performance in applications ranging from high-frequency market data, EEG seizure detection, and activity monitoring to online adaptive regression in time-varying systems (Ho et al., 2023, Moen, 13 Apr 2025, Chen et al., 2020, Leung et al., 2024)
A tabular synopsis of GOCPD methodologies drawn from key references follows:
| Methodology Source | Core Idea | Per-step Complexity |
|---|---|---|
| (Ho et al., 2023) | Unimodal loglikelihood maximization + ternary search | 3 |
| (Moen, 13 Apr 2025) | Dynamic geometric scan grid, any offline statistic | 4–5 |
| (Chen et al., 2020) | Multiscale, coordinatewise greedy CUSUM + aggregation | 6 |
| (Leung et al., 2024) | Greedy excitation-set RLS + LR change detection | 7 |
The empirical and theoretical analyses collectively establish GOCPD as an efficient, versatile, and statistically rigorous framework for rapid, robust change-point detection under streaming constraints and challenging data regimes.