SDOoop: Capturing Periodical Patterns and Out-of-phase Anomalies in Streaming Data Analysis
Abstract: Streaming data analysis is increasingly required in applications, e.g., IoT, cybersecurity, robotics, mechatronics or cyber-physical systems. Despite its relevance, it is still an emerging field with open challenges. SDO is a recent anomaly detection method designed to meet requirements of speed, interpretability and intuitive parameterization. In this work, we present SDOoop, which extends the capabilities of SDO's streaming version to retain temporal information of data structures. SDOoop spots contextual anomalies undetectable by traditional algorithms, while enabling the inspection of data geometries, clusters and temporal patterns. We used SDOoop to model real network communications in critical infrastructures and extract patterns that disclose their dynamics. Moreover, we evaluated SDOoop with data from intrusion detection and natural science domains and obtained performances equivalent or superior to state-of-the-art approaches. Our results show the high potential of new model-based methods to analyze and explain streaming data. Since SDOoop operates with constant per-sample space and time complexity, it is ideal for big data, being able to instantly process large volumes of information. SDOoop conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.
- F. Angiulli and F. Fassetti, “Detecting distance-based outliers in streams of data,” in Proc. of the Sixteenth ACM Cong. on Information and Knowledge Management. New York, NY, USA: Assoc. for Comp. Mach., 11 2007, pp. 811–820.
- S. Guha, N. Mishra, G. Roy, and O. Schrijvers, “Robust random cut forest based anomaly detection on streams,” in Int. Conf. on Mach. Learn. PMLR, 2016, pp. 2712–2721.
- L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek, M. Kloft, T. G. Dietterich, and K.-R. Müller, “A unifying review of deep and shallow anomaly detection,” Proceedings of the IEEE, vol. 109, no. 5, pp. 756–795, 2021.
- F. Iglesias Vázquez, T. Zseby, and A. Zimek, “Outlier detection based on low density models,” in ICDMW, 2018, pp. 970–979.
- A. Hartl, F. Iglesias, and T. Zseby, “SDOstream: Low-density models for streaming outlier detection,” in ESANN 2020 proceedings, 2020, pp. 661–666.
- F. Iglesias, T. Zseby, A. Hartl, and A. Zimek, “Sdoclust: Clustering with sparse data observers,” in Similarity Search and Applications, O. Pedreira and V. Estivill-Castro, Eds. Cham: Springer Nature Switzerland, 2023, pp. 185–199.
- F. Iglesias, T. Zseby, D. Ferreira, and A. Zimek, “Mdcgen: Multidimensional dataset generator for clustering,” Journal of Classification, vol. 36, no. 3, pp. 599–618, 2019.
- P. Hall, B. U. Park, R. J. Samworth et al., “Choice of neighbor order in nearest-neighbor classification,” The Annals of Statistics, vol. 36, no. 5, pp. 2135–2152, 2008.
- S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” SIGMOD Rec., vol. 29, no. 2, p. 427–438, may 2000.
- “Kdd cup 1999 data,” http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, accessed: 2021-03-04.
- G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study,” DAMI, vol. 30, no. 4, pp. 891–927, 2016.
- R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton, S. S. Mahajan, S. Basodi, A. Ahmadzadeh, X. Cai, S. Filali Boubrahimi, S. M. Hamdi, M. A. Schuh, and M. K. Georgoulis, “Multivariate time series dataset for space weather data analytics,” Scientific Data, vol. 7, no. 227, 2020.
- A. Ahmadzadeh and B. Aydin, “Multivariate Timeseries Feature Extraction on SWAN Data Benchmark (SWAN_Features),” 2020, GSU Data Mining Lab, Bitbucket repository. [Online]. Available: https://bitbucket.org/gsudmlab/swan_features
- A. Hartl, F. Iglesias, and T. Zseby, “dSalmon: High-speed anomaly detection for evolving multivariate data streams,” in Performance Evaluation Methodologies and Tools (VALUETOOLS 2023). Springer, 2024, pp. 153–169, https://github.com/CN-TU/dSalmon.
- S. Sathe and C. C. Aggarwal, “Subspace outlier detection in linear time with randomized hashing,” in 2016 IEEE 16th Int. Conference on Data Mining (ICDM). New York, NY, USA: IEEE, 2016, pp. 459–468.
- T. Pevnỳ, “Loda: Lightweight on-line detector of anomalies,” Machine Learning, vol. 102, no. 2, pp. 275–304, 2016.
- N. Williams, S. Zander, and G. Armitage, “A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification,” ACM SIGCOMM Computer Communication Review, vol. 36, no. 5, pp. 5–16, 2006.
- CAIDA, “The UCSD network telescope ”patch tuesday“ dataset,” http://www.caida.org/data/passive/telescope-patch-tuesday_dataset.xml, acc.: 2021-03-09.
- F. Iglesias and T. Zseby, “Pattern discovery in internet background radiation,” IEEE Trans. on Big Data, vol. 5, no. 4, pp. 467–480, 2017.
- G. Mahalakshmi, S. Sridevi, and S. Rajaram, “A survey on forecasting of time series data,” in Int. Conf. on Comp. Tech. and Int. Data Eng., 2016, pp. 1–8.
- N. I. Sapankevych and R. Sankar, “Time series prediction using support vector machines: A survey,” IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 24–38, 2009.
- T. W. Liao, “Clustering of time series data—a survey,” Pat. Rec., vol. 38, no. 11, pp. 1857–1874, 2005.
- J. Read, R. A. Rios, T. Nogueira, and R. F. de Mello, “Data streams are time series: Challenging assumptions,” in Intelligent Systems, R. Cerri and R. C. Prati, Eds. Cham: Springer International Publishing, 2020, pp. 529–543.
- K. Shaukat, T. M. Alam, S. Luo, S. Shabbir, I. A. Hameed, J. Li, S. K. Abbas, and U. Javed, “A review of time-series anomaly detection techniques: A step to future perspectives,” in Adv. in Inf. & Com., K. Arai, Ed. Springer, 2021, pp. 865–877.
- K. Golmohammadi and O. R. Zaiane, “Time series contextual anomaly detection for detecting market manipulation in stock market,” in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015, pp. 1–10.
- P. Boniol, J. Paparrizos, T. Palpanas, and M. J. Franklin, “Sand: Streaming subsequence anomaly detection,” Proc. VLDB Endow., vol. 14, no. 10, p. 1717–1729, jun 2021.
- K. Pasini, M. Khouadjia, A. Samé, M. Trépanier, and L. Oukhellou, “Contextual anomaly detection on time series: A case study of metro ridership analysis,” Neural Comput. Appl., vol. 34, no. 2, p. 1483–1507, jan 2022.
- S. Tan, K. Ting, and F. T. Liu, “Fast anomaly detection for streaming data,” in 22nd Int. Joint Conf. on Artificial Intelligence, 2011, pp. 1511–1516.
- F. Iglesias, A. Hartl, T. Zseby, and A. Zimek, “Anomaly detection in streaming data: A comparison and evaluation study,” ESWA, vol. 233, p. 120994, 2023.
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: Identifying density-based local outliers,” SIGMOD Rec., vol. 29, no. 2, p. 93–104, May 2000.
- E. Manzoor, H. Lamba, and L. Akoglu, “xStream: Outlier detection in feature-evolving data streams,” in Proc. of the 24th ACM SIGKDD Int. Conf. on Know. Disc. & Data Mining. New York, NY, USA: Assoc. for Comp. Mach, 2018, p. 1963–1972.
- P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, and R. Thomas, “A survey of sequential pattern mining,” Data Science and Pattern Recog., vol. 1, no. 1, pp. 54–77, 2017.
- D. Zhang, K. Lee, and I. Lee, “Periodic Pattern Mining for Spatio-Temporal Trajectories: A Survey,” in 2015 10th Int. Conf. on Int. Systems and Knowledge Engin. (ISKE). New York, NY, USA: IEEE, Nov. 2015, pp. 306–313.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.