Min-SCUSUM: Sequential Change Detection

Updated 13 November 2025

Min-SCUSUM is a sequential detection algorithm that uses differences in Hyvärinen scores to identify abrupt changes in multistream data without needing normalization constants.
The method generalizes classical CUSUM by replacing log-likelihood increments with score differences, ensuring controlled false alarm rates and asymptotically optimal detection delays via Fisher divergence.
It is applicable to high-dimensional and energy-based models, offering robust performance even when traditional likelihood-based methods are infeasible.

The min-SCUSUM method is a sequential detection and diagnosis algorithm designed for multistream quickest change detection under settings where explicit likelihood ratios are infeasible or undesirable. By relying on the Hyvärinen score and Fisher divergence, min-SCUSUM generalizes the classical CUSUM/Min-CuSum framework to unnormalized statistical models, enabling effective and provably optimal performance in high-dimensional and energy-based contexts (Warner et al., 2023, Wu et al., 2023, Chen et al., 6 Nov 2025).

1. Theoretical Foundations

Min-SCUSUM extends the concept of online sequential analysis to the regime of multiple independent data streams, each of which may undergo an abrupt distributional change at an unknown time. The key innovation is the replacement of log-likelihood increments with differences of Hyvärinen scores:

Given $M$ parallel streams, the goal is to detect as quickly as possible when any stream transitions from a “pre-change” density $p$ to a stream-specific “post-change” density $q_i$ , without requiring normalization constants.
The Hyvärinen score for a twice-differentiable density $r(x)$ on $\mathbb{R}^d$ is defined as $S_H(x; r) = \frac{1}{2}\|\nabla_x \log r(x)\|_2^2 + \Delta_x \log r(x)$ , where $\Delta_x$ is the Laplacian.
The Fisher divergence, $D_F(p\|q) = \mathbb{E}_p[\frac{1}{2}\|\nabla \log p(X) - \nabla \log q(X)\|_2^2]$ , quantifies the separation between distributions and governs the algorithm’s asymptotic detection delay.
For each stream $i$ , the instantaneous increment at time $t$ is $d_t^{(i)} = S_H(X_{i,t}; p) - S_H(X_{i,t}; q_i)$ , guaranteeing a negative drift under the null ( $p$ ) and a positive drift after the change to $q_i$ .

This approach is strictly proper and scale-invariant, circumventing normalization by relying only on gradients and Laplacians of the (potentially unnormalized) log-densities.

2. Algorithm Definition and Workflow

The min-SCUSUM algorithm operates with $M$ parallel detection statistics, each recursively updating a CUSUM-like statistic:

For each stream $i=1, \dots, M$ , initialize $W_0^{(i)}=0$ .
At each time $t$ $t$ , for stream $i$ $i$ :
1. Compute $d_t^{(i)} = S_H(X_{i,t}; p) - S_H(X_{i,t}; q_i)$ .
2. Update $W_t^{(i)} = \max \{ 0, W_{t-1}^{(i)} + d_t^{(i)} \}$ .
Fix a threshold $b > 0$ .
Define stopping times $T_i(b) = \inf\{ t \geq 1 : W_t^{(i)} \geq b \}$ for each stream and $T(b) = \min_{1 \leq i \leq M} T_i(b)$ .
At time $T(b)$ , declare a change and diagnose the altered stream via $D = \arg\max_{1 \le i \le M} W_{T(b)}^{(i)}$ .

Min-SCUSUM Pseudocode

for i in range(1, M+1):
    W[i] = 0
for t in count(1):
    observe X_t = (X_{1,t}, ..., X_{M,t})
    for i in range(1, M+1):
        d = S_H(X_{i,t}; p) - S_H(X_{i,t}; q_i)
        W[i] = max(0, W[i] + d)
    if any(W[i] >= b for i in range(1, M+1)):
        T = t
        D = argmax(W)
        break  # declare change at time T in stream D

This architecture requires only the computation of gradients and Laplacians for $S_H$ and updates all statistics in parallel using vectorized and efficient code.

3. Performance Guarantees

3.1 False Alarm Control

Under the no-change regime (all streams distributed as $p$ ), the mean time to the first false alarm satisfies: $\mathbb{E}_\infty[T(b)] \geq \frac{e^b}{M}$ Thus, setting $b = \log(M/\alpha)$ ensures $\mathbb{E}_\infty[T] \geq 1/\alpha$ for any desired false-alarm rate $\alpha$ (Chen et al., 6 Nov 2025).

3.2 Asymptotic Detection Delay

When a change occurs in stream $i$ at time $0$,

$\mathbb{E}_i[T(b)] \sim \frac{b}{D_F(q_i \| p)}, \quad \text{as } b \to \infty$

For $b = \log(M/\alpha)$ , the worst-case delay (Lorden’s criterion)

$\sup_\nu \mathrm{ess\,sup}\, \mathbb{E}_{\nu, i}[ T - \nu \mid T > \nu ] \sim \frac{ \log(M/\alpha) }{ D_F(q_i \| p) }$

Similar results hold for the Kullback–Leibler-based Min-CuSum, replacing $D_F$ with $\mathrm{KL}(f_i\|f_0)$ (Warner et al., 2023, Wu et al., 2023).

3.3 Misidentification Probability

The probability of misdiagnosis—declaring $i$ instead of the true changed stream $j$ —is exponentially controlled: $P_{\nu,j}(D=i \mid T>\nu) \leq e^{-b}(1+b)\Bigl(1+\frac{1}{D_F(q_j\|p)}+\zeta_{ij}(b)\Bigr)$ where $\zeta_{ij}(b)\to 0$ as $b\to\infty$ . With $b = \log(M/\alpha)$ , the misidentification rate decays as $O((1+\log(M/\alpha))\alpha)$ (Chen et al., 6 Nov 2025).

Summary Table

Parameter	Min-SCUSUM Control	Asymptotic Behavior
False Alarm Rate	$b = \log(M/\alpha)$	$\mathbb{E}_\infty[T]\geq 1/\alpha$
Detection Delay	$\sim b/D_F(q_i\\|p)$	$\sim \log(M/\alpha)/D_F(q_i\\|p)$
Misidentification	$O((1+b)e^{-b})$	$O((1+\log(M/\alpha))\alpha)$

4. Comparison to Likelihood-Based Approaches

Traditional Min-CuSum relies on explicit log-likelihood ratios, yielding

$S_i(n) = \max_{1 \leq k \leq n} \sum_{t=k}^{n} \ell_i(t),\quad \ell_i(t) = \log \frac{f_i(X_t)}{f_0(X_t)}$

However, such approaches are impractical for unnormalized densities or intractable partition functions. Min-SCUSUM instead requires only access to the unnormalized log-density and its derivatives, greatly expanding the class of models (e.g., energy-based models, RBMs, diffusion models) amenable to rigorous sequential change detection and diagnosis.

Theoretically, both approaches admit first-order asymptotic optimality under Lorden’s delay criterion when calibrated to ensure the false alarm and misidentification constraints (Warner et al., 2023, Chen et al., 6 Nov 2025).

5. Practical Estimation and Implementation Considerations

Estimating the required score functions $(\nabla \log p, \nabla \log q_i)$ can be accomplished via:

Score-matching (Hyvärinen, 2005): Empirically minimize Fisher divergence to fit parametric models for $\nabla \log q_\theta(x)$ using observed samples.
Deep score network approaches: Denoising-diffusion training (Song & Ermon, NeurIPS'19) enables accurate approximation of score functions in high dimensions.

Once estimators for the required gradients and Laplacians are obtained, the running cost per sample is $O(dM)$ , where $d$ is the data dimensionality.

Calibration of the threshold $b$ is governed by the desired false-alarm rate and number of channels, and all performance guarantees hold uniformly in the change-point.

6. Empirical and Application Evidence

Simulation studies on multidimensional Gaussian models and energy-based models including Gauss–Bernoulli RBMs corroborate theory:

The detection delay tracks the $b/D_F$ law, and misidentification remains bounded below the exponential theoretical upper bound.
Experiments document that a threshold as per the analytic formula yields empirical false alarm rates and delays as predicted.
In real-world applications such as video anomaly detection across multiple camera streams, min-SCUSUM identifies altered streams with high reliability even when neither pre- nor post-change densities are normalized (Chen et al., 6 Nov 2025).

A plausible implication is that min-SCUSUM enables rigorous sequential change diagnosis in modern high-dimensional settings where the likelihood-based CuSum approaches are computationally infeasible or ill-defined.

7. Connections, Limitations, and Extensions

Min-SCUSUM is directly linked to the general class of proper scoring rules, with the Hyvärinen score being a special choice that confers tractability for unnormalized models.

The method relies on the assumption that score functions are differentiable and Laplacians well-defined, which may restrict its application in discrete or degenerate models. For misspecified models, or when only approximate score estimators are available, the exponential tail bounds on misidentification error and the minimality of delay require empirical verification.

Future directions include adaptation to asynchronous changes, nonparametric score estimation, and rigorous finite-sample performance guarantees in highly misspecified settings. The framework remains extensible to more complex structural change regimes as long as the score differences retain negative and positive drift properties pre- and post-change (Chen et al., 6 Nov 2025).