Root Cause Explanation of Outliers under Noisy Mechanisms
Abstract: Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges.
- Temporal causal modeling with graphical granger methods. In 13th ACM SIGKDD, 66–75.
- Maximum a posteriori estimators as a limit of Bayes estimators. Mathematical Programming, 174: 129–144.
- Pattern recognition and machine learning, volume 4. Springer.
- DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models. arXiv preprint arXiv:2206.06821.
- Why did the distribution change? In AISTATS, 1666–1674.
- Causal structure-based root cause analysis of outliers. In ICML, 2357–2369.
- Improving kernelshap: Practical shapley value estimation using linear regression. In International Conference on Artificial Intelligence and Statistics, 3457–3465. PMLR.
- Order stability in supply chains: Coordination risk and the role of coordination stock. Production and Operations Management, 23(2): 176–196.
- Causal and interpretable rules for time series analysis. In 27th ACM SIGKDD, 2764–2772.
- Watchdog Agent an infotronics based prognostics approach for product performance degradation assessment and prediction. Advanced Engineering Informatics, 17(3-4): 109–125.
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence, 3(7): 620–631.
- Joint optimization of delay and cost for microservice composition in mobile edge computing. World Wide Web, 25(5): 2019–2047.
- Root Cause Analysis of Failures in Microservices through Causal Discovery. NeurIPS, 35: 31158–31170.
- Feature relevance quantification in explainable AI: A causal problem. In International Conference on artificial intelligence and statistics, 2907–2916. PMLR.
- IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, volume 51, 243–250. ACM New York, NY, USA.
- A unified approach to interpreting model predictions. NeurIPS, 30.
- Ranking causal anomalies by modeling local propagations on networked systems. In 2017 IEEE ICDM, 1003–1008. IEEE.
- Pearl, J. 2009. Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge university press.
- Elements of causal inference: foundations and learning algorithms. The MIT Press.
- Lumos: A library for diagnosing metric regressions in web-scale applications. In 26th ACM SIGKDD, 2562–2570.
- Power, D. 2005. Supply chain management integration and implementation: a literature review. Supply chain management: an International journal, 10(4): 252–263.
- Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3): 247–278.
- Shapley, L. S.; et al. 1953. A value for n-person games.
- Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41: 647–665.
- The many Shapley values for model explanation. In ICML, 9269–9278.
- Cloudranger: Root cause identification for cloud native systems. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 492–502. IEEE.
- Re-calibrating Feature Attributions for Model Interpretation. In 11th ICLR.
- Semi-supervised bearing fault diagnosis with adversarially-trained phase-consistent network. In 27th ACM SIGKDD, 3875–3885.
- Stability of networked control systems. IEEE control systems magazine, 21(1): 84–99.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.