Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM

Published 15 Feb 2024 in cs.LG and stat.ML | (2402.10291v2)

Abstract: Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Theodore W Anderson. 1962. On the distribution of the two-sample Cramer-von Mises criterion. The Annals of Mathematical Statistics (1962), 1148–1159.
  2. Michèle Basseville and Igor V. Nikiforov. 1993. Detection of Abrupt Changes - Theory and Application. Prentice Hall, Inc. - http://people.irisa.fr/Michele.Basseville/kniga/. 550 pages. https://hal.science/hal-00008518
  3. Boris Brodsky and Boris S. Darkhovsky. 1993. Nonparametric Methods in Change Point Problems. https://api.semanticscholar.org/CorpusID:118958786
  4. E. Brodsky and B.S. Darkhovsky. 2010. Nonparametric Methods in Change Point Problems. Springer Netherlands. https://books.google.com/books?id=c5ADkgAACAAJ
  5. W. Feller. 1948. On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions. The Annals of Mathematical Statistics 19, 2 (1948), 177 – 189. https://doi.org/10.1214/aoms/1177730243
  6. Thomas Flynn and Shinjae Yoo. 2020. Change Detection with the Kernel Cumulative Sum Algorithm. arXiv:1903.01661 [math.ST]
  7. Cheng-Der Fuh. 2003. SPRT and CUSUM in hidden Markov models. The Annals of Statistics 31, 3 (2003), 942 – 977. https://doi.org/10.1214/aos/1056562468
  8. R.G. Gallager. 1995. Discrete Stochastic Processes. Springer US. https://books.google.com/books?id=jhffF8WWNLkC
  9. A Kernel Two-Sample Test. Journal of Machine Learning Research 13, 25 (2012), 723–773. http://jmlr.org/papers/v13/gretton12a.html
  10. Thomas Gärtner. 2003. A Survey of Kernels for Structured Data. SIGKDD Explorations 5 (07 2003), 49–58. https://doi.org/10.1145/959242.959248
  11. Kernel Change-point Analysis. In Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.), Vol. 21. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2008/file/08b255a5d42b89b0585260b6f2360bdd-Paper.pdf
  12. M-Statistic for Kernel Change-Point Detection. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:5770615
  13. Gary Lorden. 1970. On Excess Over the Boundary. The Annals of Mathematical Statistics 41, 2 (1970), 520 – 527. https://doi.org/10.1214/aoms/1177697092
  14. G. Lorden. 1971. Procedures for Reacting to a Change in Distribution. The Annals of Mathematical Statistics 42, 6 (1971), 1897 – 1908. https://doi.org/10.1214/aoms/1177693055
  15. George V. Moustakides. 1986. Optimal Stopping Times for Detecting Changes in Distributions. The Annals of Statistics 14, 4 (1986), 1379 – 1387. https://doi.org/10.1214/aos/1176350164
  16. Numerical Comparison of CUSUM and Shiryaev–Roberts Procedures for Detecting Changes in Distributions. Communications in Statistics - Theory and Methods 38, 16–17 (Aug. 2009), 3225–3239. https://doi.org/10.1080/03610920902947774
  17. Kernel Mean Embedding of Distributions: A Review and Beyond. 10, 1-2 ([n. d.]), 1–141. https://doi.org/10.1561/2200000060
  18. E. S. Page. 1954. CONTINUOUS INSPECTION SCHEMES. Biometrika 41 (1954), 100–115. https://api.semanticscholar.org/CorpusID:121530032
  19. Anthony N. Pettitt. 1976. A two-sample Anderson-Darling rank statistic. Biometrika 63 (1976), 161–168. https://api.semanticscholar.org/CorpusID:119481227
  20. Aleksey S. Polunchenko and Alexander G. Tartakovsky. 2010. On optimality of the Shiryaev–Roberts procedure for detecting a change in distribution. The Annals of Statistics 38, 6 (Dec. 2010). https://doi.org/10.1214/09-aos775
  21. H. Vincent Poor and Olympia Hadjiliadis. 2008. Quickest detection. Vol. 9780521621045. Cambridge University Press, United Kingdom. https://doi.org/10.1017/CBO9780511754678 Publisher Copyright: © Cambridge University Press 2009..
  22. W.A. Shewhart and W.E. Deming. 1939. Statistical Method from the Viewpoint of Quality Control. Graduate School, The Department of Agriculture. https://books.google.com/books?id=-VptAAAAMAAJ
  23. Graph Kernels. arXiv:0807.0093 [cs.LG]
  24. A. Wald. 1947. Sequential analysis. J. Wiley & sons, Incorporated. https://books.google.com/books?id=0nREAAAAIAAJ

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 1 like about this paper.