Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Fly Detection of Root Causes from Observed Data with Application to IT Systems (2402.06500v2)

Published 9 Feb 2024 in cs.AI

Abstract: This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Survey and evaluation of causal discovery methods for time series. Journal of Artificial Intelligence Research, 73:767–819, feb 2022a.
  2. Discovery of extended summary graphs in time series. In J. Cussens and K. Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 96–106. PMLR, 01–05 Aug 2022b.
  3. Root cause identification for collective anomalies in time series given an acyclic summary causal graph with loops. In F. Ruiz, editor, Proceedings of the Twenty-Sixth Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research. PMLR, April 2023.
  4. Case studies of causal discovery from it monitoring time series. submitted, 2023.
  5. Why did the distribution change? In A. Banerjee and K. Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 1666–1674. PMLR, 13–15 Apr 2021.
  6. Causal structure-based root cause analysis of outliers. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 2357–2369. PMLR, 17–23 Jul 2022.
  7. Adaptive threshold for anomaly detection using time series segmentation. In Neural Information Processing: 22nd International Conference, ICONIP 2015, Istanbul, Turkey, November 9-12, 2015, Proceedings Part III 22, pages 82–89. Springer, 2015.
  8. Root cause analysis of failures in microservices through causal discovery. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 31158–31170. Curran Associates, Inc., 2022.
  9. Causal inference-based root cause analysis for online service systems with intervention recognition. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 3230–3240, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi: 10.1145/3534678.3539041.
  10. S. Ligus. Effective Monitoring and Alerting. O’Reilly, 2013. ISBN 9781449333522.
  11. Localizing failure root causes in a microservice through causality inference. In 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), pages 1–10, 2020. doi: 10.1109/IWQoS49365.2020.9213058.
  12. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
  13. J. Rudnitckaia. Process mining. data science in action. University of Technology, Faculty of Information Technology, pages 1–11, 2016.
  14. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11):eaau4996, 2019. doi: 10.1126/sciadv.aau4996.
  15. Causation, prediction, and search. MIT press, 2000.
  16. Root cause analysis in process mining with probabilistic temporal logic. In International Conference on Process Mining, pages 73–84. Springer, 2021.
  17. Interdependent causal networks for root cause localization. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 5051–5060, New York, NY, USA, 2023a. Association for Computing Machinery.
  18. Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 5116–5125, New York, NY, USA, 2023b. Association for Computing Machinery.
  19. Cloudranger: Root cause identification for cloud native systems. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 492–502. IEEE, 2018.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets