Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Prior Errors in Causal Structure Learning: Towards LLM driven Prior Knowledge (2306.07032v1)

Published 12 Jun 2023 in cs.LG and cs.AI

Abstract: Causal structure learning, a prominent technique for encoding cause and effect relationships among variables, through Bayesian Networks (BNs). Merely recovering causal structures from real-world observed data lacks precision, while the development of LLMs (LLM) is opening a new frontier of causality. LLM presents strong capability in discovering causal relationships between variables with the "text" inputs defining the investigated variables, leading to a potential new hierarchy and new ladder of causality. We aim an critical issue in the emerging topic of LLM based causal structure learning, to tackle erroneous prior causal statements from LLM, which is seldom considered in the current context of expert dominating prior resources. As a pioneer attempt, we propose a BN learning strategy resilient to prior errors without need of human intervention. Focusing on the edge-level prior, we classify the possible prior errors into three types: order-consistent, order-reversed, and irrelevant, and provide their theoretical impact on the Structural Hamming Distance (SHD) under the presumption of sufficient data. Intriguingly, we discover and prove that only the order-reversed error contributes to an increase in a unique acyclic closed structure, defined as a "quasi-circle". Leveraging this insight, a post-hoc strategy is employed to identify the order-reversed prior error by its impact on the increment of "quasi-circles". Through empirical evaluation on both real and synthetic datasets, we demonstrate our strategy's robustness against prior errors. Specifically, we highlight its substantial ability to resist order-reversed errors while maintaining the majority of correct prior knowledge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Y. Zhang, P. Tiňo, A. Leonardis, and K. Tang, “A survey on neural network interpretability,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 5, pp. 726–742, 2021.
  2. X. Wang, S. Lyu, X. Wu, T. Wu, and H. Chen, “Generalization bounds for estimating causal effects of continuous treatments,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 8605–8617.
  3. K. Yu, L. Liu, J. Li, W. Ding, and T. D. Le, “Multi-source causal feature selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2240–2256, 2019.
  4. H. Zhang, L. Xiao, X. Cao, and H. Foroosh, “Multiple adverse weather conditions adaptation for object detection via causal intervention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022.
  5. M. Scanagatta, A. Salmerón, and F. Stella, “A survey on Bayesian network structure learning from data,” Progress in Artificial Intelligence, vol. 8, pp. 425–439, 2019.
  6. B. Jiang, X. Wu, K. Yu, and H. Chen, “Joint Semi-supervised feature selection and classification through Bayesian approach,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3983–3990.
  7. C. Li and H. Chen, “Sparse Bayesian approach for feature selection,” in 2014 IEEE Symposium on Computational Intelligence in Big Data.   IEEE, 2014, pp. 1–7.
  8. E. Kıcıman, R. Ness, A. Sharma, and C. Tan, “Causal reasoning and large language models: Opening a new frontier for causality,” arXiv preprint arXiv:2305.00050, 2023.
  9. S. Long, T. Schuster, A. Piché, S. Research et al., “Can large language models build causal graphs?” arXiv preprint arXiv:2303.05279, 2023.
  10. C. P. De Campos and Q. Ji, “Efficient structure learning of Bayesian networks using constraints,” The Journal of Machine Learning Research, vol. 12, pp. 663–689, 2011.
  11. X. Wang, L. Chen, T. Ban, D. Lyu, Y. Guan, X. Wu, X. Zhou, and H. Chen, “Accurate label refinement from multiannotator of remote sensing data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13, 2023.
  12. T. Ban, X. Wang, L. Chen, X. Wu, Q. Chen, and H. Chen, “Quality evaluation of triples in knowledge graph by incorporating internal with external consistency,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2022.
  13. A. C. Constantinou, Z. Guo, and N. K. Kitson, “The impact of prior knowledge on causal structure learning,” arXiv preprint arXiv:2102.00473, 2021.
  14. V. Asvatourian, P. Leray, S. Michiels, and E. Lanoy, “Integrating expert’s knowledge constraint of time dependent exposures in structure learning for Bayesian networks,” Artificial Intelligence in Medicine, vol. 107, p. 101874, 2020.
  15. C. P. De Campos and Q. Ji, “Improving Bayesian network parameter learning using constraints,” in 2008 19th International Conference on Pattern Recognition.   IEEE, 2008, pp. 1–4.
  16. X. Yang, H. Zhang, and J. Cai, “Deconfounded image captioning: A causal retrospect,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
  17. P. N. Garner and S. Tong, “A Bayesian approach to recurrence in neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 8, pp. 2527–2537, 2021.
  18. B. Jiang, H. Chen, B. Yuan, and X. Yao, “Scalable graph-based Semi-supervised learning through sparse Bayesian model,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2758–2771, 2017.
  19. R. Castelo and A. Siebes, “Priors on network structures. Biasing the search for Bayesian networks,” International Journal of Approximate Reasoning, vol. 24, no. 1, pp. 39–57, 2000.
  20. R. Eggeling, J. Viinikka, A. Vuoksenmaa, and M. Koivisto, “On structure priors for learning Bayesian networks,” in The 22nd International Conference on Artificial Intelligence and Statistics.   PMLR, 2019, pp. 1687–1695.
  21. G. Borboudakis and I. Tsamardinos, “Scoring and searching over Bayesian networks with causal and associative priors,” arXiv preprint arXiv:1408.2057, 2014.
  22. H. Amirkhani, M. Rahmati, P. J. Lucas, and A. Hommersom, “Exploiting experts’ knowledge for structure learning of Bayesian networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2154–2170, 2016.
  23. G. Borboudakis and I. Tsamardinos, “Incorporating causal prior knowledge as path-constraints in Bayesian networks and maximal ancestral graphs,” arXiv preprint arXiv:1206.6390, 2012.
  24. L. M. de Campos and J. G. Castellano, “Bayesian network learning algorithms using structural restrictions,” International Journal of Approximate Reasoning, vol. 45, no. 2, pp. 233–254, 2007.
  25. C. P. De Campos, Z. Zeng, and Q. Ji, “Structure learning of Bayesian networks using constraints,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 113–120.
  26. E. Y.-J. Chen, Y. Shen, A. Choi, and A. Darwiche, “Learning Bayesian networks with ancestral constraints,” Advances in Neural Information Processing Systems, vol. 29, 2016.
  27. Z. Wang, X. Gao, Y. Yang, X. Tan, and D. Chen, “Learning Bayesian networks based on order graph with ancestral constraints,” Knowledge-Based Systems, vol. 211, p. 106515, 2021.
  28. A. Li and P. Beek, “Bayesian network structure learning with side constraints,” in International Conference on Probabilistic Graphical Models.   PMLR, 2018, pp. 225–236.
  29. A. Cano, A. R. Masegosa, and S. Moral, “A method for integrating expert knowledge when learning Bayesian networks from data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 5, pp. 1382–1394, 2011.
  30. L. Zhou, L. Wang, L. Liu, P. Ogunbona, and D. Shen, “Learning discriminative Bayesian networks from high-dimensional continuous neuroimaging data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2269–2283, 2016.
  31. Q. Ye, A. A. Amini, and Q. Zhou, “Optimizing regularized Cholesky score for order-based learning of Bayesian networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3555–3572, 2021.
  32. A. A. Neath and J. E. Cavanaugh, “The Bayesian information criterion: Background, derivation, and applications,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 4, no. 2, pp. 199–203, 2012.
  33. J. Suzuki, “A theoretical analysis of the BDeu scores in Bayesian network structure learning,” Behaviormetrika, vol. 44, pp. 97–116, 2017.
  34. M. H. Hansen and B. Yu, “Model selection and the principle of minimum description length,” Journal of the American Statistical Association, vol. 96, no. 454, pp. 746–774, 2001.
  35. D. Malinsky and P. Spirtes, “Learning the structure of a nonstationary vector autoregression,” in The 22nd International Conference on Artificial Intelligence and Statistics.   PMLR, 2019, pp. 2986–2994.
  36. T. Gao and D. Wei, “Parallel Bayesian network structure learning,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1685–1694.
  37. D. M. Chickering, “Optimal structure identification with greedy search,” Journal of Machine Learning Research, vol. 3, no. Nov, pp. 507–554, 2002.
  38. I. Flesch and P. J. Lucas, “Markov equivalence in Bayesian networks,” Advances in Probabilistic Graphical Models, pp. 3–38, 2007.
  39. M. Tsagris, “Bayesian network learning with the PC algorithm: An improved and correct variation,” Applied Artificial Intelligence, vol. 33, no. 2, pp. 101–123, 2019.
  40. I. Tsamardinos, L. E. Brown, and C. F. Aliferis, “The Max-Min Hill-Climbing Bayesian network structure learning algorithm,” Machine Learning, vol. 65, pp. 31–78, 2006.
  41. R. T. O’Donnell, A. E. Nicholson, B. Han, K. B. Korb, M. J. Alam, and L. R. Hope, “Causal discovery with prior information,” in AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006. Proceedings 19.   Springer, 2006, pp. 1162–1167.
  42. C. Martínez-Martínez, J. Mendez-Bermudez, J. M. Rodríguez, and J. M. Sigarreta, “Computational and analytical studies of the Randić index in Erdös–Rényi models,” Applied Mathematics and Computation, vol. 377, p. 125137, 2020.
  43. X. Wang, L. Chen, T. Ban, M. Usman, Y. Guan, S. Liu, T. Wu, and H. Chen, “Knowledge graph quality control: A survey,” Fundamental Research, vol. 1, no. 5, pp. 607–626, 2021.
  44. X. Wu, H. Chen, G. Wu, J. Liu, Q. Zheng, X. He, A. Zhou, Z.-Q. Zhao, B. Wei, M. Gao et al., “Knowledge engineering with big data,” IEEE Intelligent Systems, vol. 30, no. 5, pp. 46–55, 2015.
  45. X. Wang, T. Ban, L. Chen, X. Wu, D. Lyu, and H. Chen, “Knowledge verification from data,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2022.
Citations (16)

Summary

We haven't generated a summary for this paper yet.