An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM
Abstract: Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection.
- Theodore W Anderson. 1962. On the distribution of the two-sample Cramer-von Mises criterion. The Annals of Mathematical Statistics (1962), 1148–1159.
- Michèle Basseville and Igor V. Nikiforov. 1993. Detection of Abrupt Changes - Theory and Application. Prentice Hall, Inc. - http://people.irisa.fr/Michele.Basseville/kniga/. 550 pages. https://hal.science/hal-00008518
- Boris Brodsky and Boris S. Darkhovsky. 1993. Nonparametric Methods in Change Point Problems. https://api.semanticscholar.org/CorpusID:118958786
- E. Brodsky and B.S. Darkhovsky. 2010. Nonparametric Methods in Change Point Problems. Springer Netherlands. https://books.google.com/books?id=c5ADkgAACAAJ
- W. Feller. 1948. On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions. The Annals of Mathematical Statistics 19, 2 (1948), 177 – 189. https://doi.org/10.1214/aoms/1177730243
- Thomas Flynn and Shinjae Yoo. 2020. Change Detection with the Kernel Cumulative Sum Algorithm. arXiv:1903.01661Â [math.ST]
- Cheng-Der Fuh. 2003. SPRT and CUSUM in hidden Markov models. The Annals of Statistics 31, 3 (2003), 942 – 977. https://doi.org/10.1214/aos/1056562468
- R.G. Gallager. 1995. Discrete Stochastic Processes. Springer US. https://books.google.com/books?id=jhffF8WWNLkC
- A Kernel Two-Sample Test. Journal of Machine Learning Research 13, 25 (2012), 723–773. http://jmlr.org/papers/v13/gretton12a.html
- Thomas Gärtner. 2003. A Survey of Kernels for Structured Data. SIGKDD Explorations 5 (07 2003), 49–58. https://doi.org/10.1145/959242.959248
- Kernel Change-point Analysis. In Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.), Vol. 21. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2008/file/08b255a5d42b89b0585260b6f2360bdd-Paper.pdf
- M-Statistic for Kernel Change-Point Detection. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:5770615
- Gary Lorden. 1970. On Excess Over the Boundary. The Annals of Mathematical Statistics 41, 2 (1970), 520 – 527. https://doi.org/10.1214/aoms/1177697092
- G. Lorden. 1971. Procedures for Reacting to a Change in Distribution. The Annals of Mathematical Statistics 42, 6 (1971), 1897 – 1908. https://doi.org/10.1214/aoms/1177693055
- George V. Moustakides. 1986. Optimal Stopping Times for Detecting Changes in Distributions. The Annals of Statistics 14, 4 (1986), 1379 – 1387. https://doi.org/10.1214/aos/1176350164
- Numerical Comparison of CUSUM and Shiryaev–Roberts Procedures for Detecting Changes in Distributions. Communications in Statistics - Theory and Methods 38, 16–17 (Aug. 2009), 3225–3239. https://doi.org/10.1080/03610920902947774
- Kernel Mean Embedding of Distributions: A Review and Beyond. 10, 1-2 ([n. d.]), 1–141. https://doi.org/10.1561/2200000060
- E. S. Page. 1954. CONTINUOUS INSPECTION SCHEMES. Biometrika 41 (1954), 100–115. https://api.semanticscholar.org/CorpusID:121530032
- Anthony N. Pettitt. 1976. A two-sample Anderson-Darling rank statistic. Biometrika 63 (1976), 161–168. https://api.semanticscholar.org/CorpusID:119481227
- Aleksey S. Polunchenko and Alexander G. Tartakovsky. 2010. On optimality of the Shiryaev–Roberts procedure for detecting a change in distribution. The Annals of Statistics 38, 6 (Dec. 2010). https://doi.org/10.1214/09-aos775
- H. Vincent Poor and Olympia Hadjiliadis. 2008. Quickest detection. Vol. 9780521621045. Cambridge University Press, United Kingdom. https://doi.org/10.1017/CBO9780511754678 Publisher Copyright: © Cambridge University Press 2009..
- W.A. Shewhart and W.E. Deming. 1939. Statistical Method from the Viewpoint of Quality Control. Graduate School, The Department of Agriculture. https://books.google.com/books?id=-VptAAAAMAAJ
- Graph Kernels. arXiv:0807.0093Â [cs.LG]
- A. Wald. 1947. Sequential analysis. J. Wiley & sons, Incorporated. https://books.google.com/books?id=0nREAAAAIAAJ
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.