Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World (2311.10421v2)

Published 17 Nov 2023 in cs.LG and cs.SE

Abstract: Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. “Combat Security Alert Fatigue with AI-Assisted Techniques” In Cyber Security Experimentation and Test Workshop, CSET ’21 Virtual, CA, USA: Association for Computing Machinery, 2021, pp. 9–16 DOI: 10.1145/3474718.3474723
  2. Firas Bayram, Bestoun S. Ahmed and Andreas Kassler “From concept drift to model degradation: An overview on performance-aware drift detectors” In Knowledge-Based Systems 245, 2022, pp. 108632 DOI: https://doi.org/10.1016/j.knosys.2022.108632
  3. “Predicting Disk Replacement towards Reliable Data Centers” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 2016, pp. 39–48
  4. Rodolfo C. Cavalcante, Leandro L. Minku and Adriano L.I. Oliveira “FEDD: Feature Extraction for Explicit Concept Drift Detection in time series” In 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 740–747
  5. “A joint model for IT operation series prediction and anomaly detection” In Neurocomputing 448, 2021, pp. 130–139
  6. “Outage Prediction and Diagnosis for Cloud Service Systems” In The World Wide Web Conference, WWW ’19 San Francisco, CA, USA: Association for Computing Machinery, 2019, pp. 2659–2665 DOI: 10.1145/3308558.3313501
  7. “Outage Prediction and Diagnosis for Cloud Service Systems” In The World Wide Web Conference, WWW ’19, 2019, pp. 2659–2665
  8. “AI for IT operations (AIOps) on cloud platforms: Reviews, opportunities and challenges”, 2023 arXiv:2304.04661 [cs.LG]
  9. Yingnong Dang, Qingwei Lin and Peng Huang “AIOps: Real-World Challenges and Research Innovations” In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2019, pp. 4–5
  10. “The Entropy-Based Time Domain Feature Extraction for Online Concept Drift Detection” In Entropy 21.12, 2019 DOI: 10.3390/e21121187
  11. “A Survey on Concept Drift Adaptation” In ACM Comput. Surv. 46.4 New York, NY, USA: Association for Computing Machinery, 2014 DOI: 10.1145/2523813
  12. “Time Series Forecasting in the Presence of Concept Drift: A PSO-based Approach” In 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), 2017, pp. 239–246 DOI: 10.1109/ICTAI.2017.00046
  13. “Experience Report: System Log Analysis for Anomaly Detection” In 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), 2016, pp. 207–218 DOI: 10.1109/ISSRE.2016.21
  14. “Diagnosing Cloud Performance Anomalies Using Large Time Series Dataset Analysis” In 2014 IEEE 7th International Conference on Cloud Computing, 2014, pp. 930–933 DOI: 10.1109/CLOUD.2014.129
  15. “Time-series extreme event forecasting with neural networks at Uber” In International conference on machine learning 34, 2017, pp. 1–5
  16. “Adopting Autonomic Computing Capabilities in Existing Large-Scale Systems” In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), 2018, pp. 1–10
  17. “SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults” In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020, pp. 92–103 DOI: 10.1109/ISSRE5003.2020.00018
  18. “Predicting Node Failures in an Ultra-Large-Scale Cloud Computing Platform: An AIOps Solution” In ACM Transactions on Software Engineering and Methodology 29.2 New York, NY, USA: Association for Computing Machinery, 2020
  19. “Predicting Node Failures in an Ultra-Large-Scale Cloud Computing Platform: An AIOps Solution” In ACM Trans. Softw. Eng. Methodol. 29.2 New York, NY, USA: Association for Computing Machinery, 2020
  20. “Identifying Recurrent and Unknown Performance Issues” In 2014 IEEE International Conference on Data Mining, 2014, pp. 320–329 DOI: 10.1109/ICDM.2014.96
  21. “Predicting Node Failure in Cloud Service Systems” In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018 Lake Buena Vista, FL, USA: Association for Computing Machinery, 2018, pp. 480–490
  22. “An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions” In ACM Trans. Softw. Eng. Methodol. 30.4 New York, NY, USA: Association for Computing Machinery, 2021
  23. “Towards a Consistent Interpretation of AIOps Models” In ACM Trans. Softw. Eng. Methodol. 31.1 New York, NY, USA: Association for Computing Machinery, 2021
  24. “A Meta-Summary of Challenges in Building Products with ML Components – Collecting Experiences from 4758+ Practitioners” In 2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN) Los Alamitos, CA, USA: IEEE Computer Society, 2023, pp. 171–183 DOI: 10.1109/CAIN58948.2023.00034
  25. “Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process” In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22 Pittsburgh, Pennsylvania: Association for Computing Machinery, 2022, pp. 413–425 DOI: 10.1145/3510003.3510209
  26. Paolo Notaro, Jorge Cardoso and Michael Gerndt “A Systematic Mapping Study in AIOps” In Service-Oriented Computing – ICSOC 2020 Workshops, 2021, pp. 110–123
  27. “Maintaining and Monitoring AIOps Models Against Concept Drift” In 2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN), 2023, pp. 98–99 DOI: 10.1109/CAIN58948.2023.00024
  28. “Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study” In 7th Workshop on Real-time Stream Analytics, Stream Mining, CER/CEP & Stream Data Management in Big Data, 2022
  29. Oleksandr Provotar, Yaroslav M.) Linder and Maksym Veres “Unsupervised Anomaly Detection in Time Series Using LSTM-Based Autoencoders” In 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT), 2019, pp. 513–517
  30. “Fourier Transform Based Spatial Outlier Mining” In Proceedings of the 10th International Conference on Intelligent Data Engineering and Automated Learning, 2009, pp. 317–324
  31. “Time-Series Anomaly Detection Service at Microsoft” In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19 Anchorage, AK, USA: Association for Computing Machinery, 2019, pp. 3009–3017 DOI: 10.1145/3292500.3330680
  32. Andrea Rosà, Lydia Y. Chen and Walter Binder “Catching failures of failures at big-data clusters: A two-level neural network approach” In 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS), 2015, pp. 231–236 DOI: 10.1109/IWQoS.2015.7404739
  33. Nosayba El-Sayed, Hongyu Zhu and Bianca Schroeder “Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations” In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 1333–1344 DOI: 10.1109/ICDCS.2017.317
  34. Sebastian Schmidl, Phillip Wenig and Thorsten Papenbrock “Anomaly Detection in Time Series: A Comprehensive Evaluation” In Proc. VLDB Endow. 15, 2022, pp. 1779–1797
  35. “A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives” In Advances in Information and Communication Springer International Publishing, 2021, pp. 865–877
  36. “Demystifying Numenta anomaly benchmark” In 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 1570–1577 DOI: 10.1109/IJCNN.2017.7966038
  37. “Assumption-Free Anomaly Detection in Time Series” In Proceedings of the 17th International Conference on Scientific and Statistical Database Management, SSDBM’2005 Santa Barbara, CA: Lawrence Berkeley Laboratory, 2005, pp. 237–240
  38. Renjie Wu and Eamonn J. Keogh “Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress” In IEEE Transactions on Knowledge and Data Engineering 35.3, 2023, pp. 2421–2429 DOI: 10.1109/TKDE.2021.3112126
  39. “Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications” In WWW ’18: Proceedings of the 2018 World Wide Web Conference, 2018
  40. “Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications” In Proceedings of the 2018 World Wide Web Conference, WWW ’18 Lyon, France: International World Wide Web Conferences Steering Committee, 2018, pp. 187–196 DOI: 10.1145/3178876.3185996
  41. “Improving Service Availability of Cloud Systems by Predicting Disk Error” In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference USA: USENIX Association, 2018, pp. 481–493
  42. “Time Series Outlier Detection Based on Sliding Window Prediction” In Mathematical Problems in Engineering 2014, 2014

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com