Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey (2403.09606v1)

Published 14 Mar 2024 in cs.CL and cs.AI

Abstract: Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of NLP models by capturing causal relationships among variables. The emergence of generative LLMs has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs' reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (127)
  1. “Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning” In arXiv preprint arXiv:2312.06820, 2023
  2. “GPT-4 Technical Report” In arXiv preprint arXiv:2303.08774, 2023
  3. “Deep Learning of a Pre-trained Language Model’s Joke Classifier Using GPT-2” In Journal of Hunan University Natural Sciences 48.8, 2021
  4. Alessandro Antonucci, Gregorio Piqu’e and Marco Zaffalon “Zero-shot Causal Graph Extrapolation from Text via LLMs”, 2023 URL: https://api.semanticscholar.org/CorpusID:266521610
  5. “Large Language Models for Biomedical Causal Graph Construction” In arXiv preprint arXiv:2301.12473, 2023
  6. “Qwen technical report” In arXiv preprint arXiv:2309.16609, 2023
  7. “Causal Structure Learning Supervised by Large Language Model” In arXiv preprint arXiv:2311.11689, 2023
  8. “From query tools to causal architects: Harnessing large language models for advanced causal discovery from data” In arXiv preprint arXiv:2306.16902, 2023
  9. Rongzhou Bao, Jiayi Wang and Hai Zhao “Defending pre-trained language models from adversarial word substitutions without performance sacrifice” In arXiv preprint arXiv:2105.14553, 2021
  10. “Eliciting latent predictions from transformers with the tuned lens” In arXiv preprint arXiv:2303.08112, 2023
  11. “Relevance-based Infilling for Natural Language Counterfactuals” In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 88–98
  12. “LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?” In arXiv preprint arXiv:2309.13340, 2023
  13. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  14. “Sparks of artificial general intelligence: Early experiments with gpt-4” In arXiv preprint arXiv:2303.12712, 2023
  15. “Can prompt probe pretrained language models? understanding the invisible risks from a causal view” In arXiv preprint arXiv:2203.12258, 2022
  16. “End-to-end object detection with transformers” In European conference on computer vision, 2020, pp. 213–229 Springer
  17. “Learning a Structural Causal Model for Intuition Reasoning in Conversation” In arXiv preprint arXiv:2305.17727, 2023
  18. “Do models explain themselves? counterfactual simulatability of natural language explanations” In arXiv preprint arXiv:2307.08678, 2023
  19. “DISCO: distilling counterfactuals with large language models” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 5514–5528
  20. Robert Dale “GPT-3: What’s it good for?” In Natural Language Engineering 27.1 Cambridge University Press, 2021, pp. 113–118
  21. “Commonsense reasoning and commonsense knowledge in artificial intelligence” In Communications of the ACM 58.9 ACM New York, NY, USA, 2015, pp. 92–103
  22. “Word embeddings via causal inference: Gender bias reducing and semantic information preserving” In Proceedings of the AAAI Conference on Artificial Intelligence 36.11, 2022, pp. 11864–11872
  23. “Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond” In Transactions of the Association for Computational Linguistics 10 Cambridge, MA: MIT Press, 2022, pp. 1138–1158 DOI: 10.1162/tacl˙a˙00511
  24. “Causal-structure Driven Augmentations for Text OOD Generalization” In arXiv preprint arXiv:2310.12803, 2023
  25. “Fair-Prism: Evaluating fairness-related harms in text generation” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2023
  26. “GPT-3: Its nature, scope, limits, and consequences” In Minds and Machines 30 Springer, 2020, pp. 681–694
  27. “Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation” In arXiv preprint arXiv:2305.07375, 2023
  28. “Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals” In arXiv preprint arXiv:2310.00603, 2023
  29. “Finding alignments between interpretable causal variables and distributed neural representations” In arXiv preprint arXiv:2303.02536, 2023
  30. “LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation” In arXiv preprint arXiv, 2024
  31. “The false promise of imitating proprietary llms” In arXiv preprint arXiv:2305.15717, 2023
  32. “A survey of methods for explaining black box models” In ACM computing surveys (CSUR) 51.5 ACM New York, NY, USA, 2018, pp. 1–42
  33. “Finding Neurons in a Haystack: Case Studies with Sparse Probing” In arXiv preprint arXiv:2305.01610, 2023
  34. “Training compute-optimal large language models” In arXiv preprint arXiv:2203.15556, 2022
  35. “Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models” In arXiv preprint arXiv:2310.14491, 2023
  36. “CLOMO: Counterfactual Logical Modification with Large Language Models” In arXiv preprint arXiv:2311.17438, 2023
  37. “Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures” In arXiv preprint arXiv:2311.08605, 2023
  38. “Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach” In arXiv preprint arXiv:2310.06680, 2023
  39. “Certified robustness to adversarial word substitutions” In arXiv preprint arXiv:1909.00986, 2019
  40. “Can Large Language Models Infer Causation from Correlation?” In arXiv preprint arXiv:2306.05836, 2023
  41. “Cladder: A benchmark to assess causal reasoning capabilities of language models” In arXiv preprint arXiv:2312.04350, 2023
  42. “Advancing the state of the art in open domain dialog systems through the alexa prize” In arXiv preprint arXiv:1812.10757, 2018
  43. “Can ChatGPT Understand Causal Language in Science Claims?” In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 2023, pp. 379–389
  44. “Causal reasoning and large language models: Opening a new frontier for causality” In arXiv preprint arXiv:2305.00050, 2023
  45. “Large Language Models are Temporal and Causal Reasoners for Video Question Answering” In arXiv preprint arXiv:2310.15747, 2023
  46. “Measurement bias and effect restoration in causal inference” In Biometrika 101.2 Oxford University Press, 2014, pp. 423–437
  47. “Relation-Oriented: Toward Knowledge-Aligned Causal AI” In arXiv preprint arXiv:2307.16387, 2023
  48. “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models” In arXiv preprint arXiv:2301.12597, 2023
  49. “Image Content Generation with Causal Reasoning” In arXiv preprint arXiv:2312.07132, 2023
  50. “Large Language Models as Counterfactual Generator: Strengths and Weaknesses” In arXiv preprint arXiv:2305.14791, 2023
  51. “Monkey: Image resolution and text label are important things for large multi-modal models” In arXiv preprint arXiv:2311.06607, 2023
  52. “Towards understanding in-context learning with contrastive demonstrations and saliency maps” In arXiv preprint arXiv:2307.05052, 2023
  53. “Holistic evaluation of language models” In arXiv preprint arXiv:2211.09110, 2022
  54. Zachary C Lipton “The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.” In Queue 16.3 ACM New York, NY, USA, 2018, pp. 31–57
  55. “Aligning Large Multi-Modal Model with Robust Instruction Tuning” In arXiv preprint arXiv:2306.14565, 2023
  56. “HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V (ision), LLaVA-1.5, and Other Multi-modality Models” In arXiv preprint arXiv:2310.14566, 2023
  57. “MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning” In arXiv preprint arXiv:2311.10774, 2023
  58. “Visual news: Benchmark and challenges in news image captioning” In arXiv preprint arXiv:2010.03743, 2020
  59. “Visual instruction tuning” In arXiv preprint arXiv:2304.08485, 2023
  60. “The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code” In arXiv preprint arXiv:2305.19213, 2023
  61. “Can large language models build causal graphs?” In arXiv preprint arXiv:2303.05279, 2023
  62. “Neuro-Symbolic Procedural Planning with Commonsense Prompting” In arXiv preprint arXiv:2206.02928, 2022
  63. “Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models” In arXiv preprint arXiv:2306.05424, 2023
  64. Aman Madaan, Katherine Hermann and Amir Yazdanbakhsh “What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study” In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 1448–1535
  65. “CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation” In arXiv preprint arXiv:2306.00374, 2023
  66. “An overview of bard: an early experiment with generative ai” In AI. Google Static Documents 2, 2023
  67. “GPTEval: A survey on assessments of ChatGPT and GPT-4” In arXiv preprint arXiv:2308.12488, 2023
  68. Nicholas Meade, Elinor Poole-Dayan and Siva Reddy “An empirical survey of the effectiveness of debiasing techniques for pre-trained language models” In arXiv preprint arXiv:2110.08527, 2021
  69. “Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks” In arXiv preprint arXiv:2107.07610, 2021
  70. Xin Miao, Yongqi Li and Tieyun Qian “Generating Commonsense Counterfactuals for Stable Relation Extraction” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 5654–5668
  71. “Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer” In arXiv preprint arXiv:2311.07191, 2023
  72. “MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks” In arXiv preprint arXiv:2310.19677, 2023
  73. “Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning” In arXiv preprint arXiv:2305.12295, 2023
  74. “Answering Causal Questions with Augmented LLMs”, 2023
  75. Judea Pearl “Causality” Cambridge university press, 2009
  76. Judea Pearl “Graphical models for probabilistic and causal reasoning” In Quantified representation of uncertainty and imprecision Springer, 1998, pp. 367–389
  77. “Instruction tuning with gpt-4” In arXiv preprint arXiv:2304.03277, 2023
  78. “Improving language understanding by generative pre-training” OpenAI, 2018
  79. “CRAB: Assessing the Strength of Causal Relationships Between Real-world Events” In arXiv preprint arXiv:2311.04284, 2023
  80. Donald B Rubin “Estimating causal effects of treatments in randomized and nonrandomized studies.” In Journal of educational Psychology 66.5 American Psychological Association, 1974, pp. 688
  81. “Exploiting cloze questions for few shot text classification and natural language inference” In arXiv preprint arXiv:2001.07676, 2020
  82. Gabriel Stanovsky, Noah A. Smith and Luke Zettlemoyer “Evaluating Gender Bias in Machine Translation” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Florence, Italy: Association for Computational Linguistics, 2019, pp. 1679–1684 DOI: 10.18653/v1/P19-1164
  83. Alessandro Stolfo, Yonatan Belinkov and Mrinmaya Sachan “A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis”, 2023 arXiv:2305.15054 [cs.CL]
  84. Shane Storks, Qiaozi Gao and Joyce Y Chai “Commonsense reasoning for natural language understanding: A survey of benchmarks, resources, and approaches” In arXiv preprint arXiv:1904.01172, 2019, pp. 1–60
  85. “Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4950–4959
  86. “Link-context learning for multimodal llms” In arXiv preprint arXiv:2308.07891, 2023
  87. Juanhe TJ Tan “Causal Abstraction for Chain-of-Thought Reasoning in Arithmetic Word Problems” In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, 2023, pp. 155–168
  88. “Towards causalgpt: A multi-agent approach for faithful knowledge reasoning via promoting causal consistency in llms” In arXiv preprint arXiv:2308.11914, 2023
  89. “Alpaca: A strong, replicable instruction-following model” In Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html 3.6, 2023, pp. 7
  90. “Llama 2: Open foundation and fine-tuned chat models” In arXiv preprint arXiv:2307.09288, 2023
  91. Ruibo Tu, Chao Ma and Cheng Zhang “Causal-discovery performance of chatgpt in the context of neuropathic pain diagnosis” In arXiv preprint arXiv:2301.13819, 2023
  92. “Causal Inference Using LLM-Guided Discovery” In arXiv preprint arXiv:2310.15117, 2023
  93. “Attention is all you need” In Advances in neural information processing systems 30, 2017
  94. “Biasasker: Measuring the bias in conversational ai system” In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 515–527
  95. “A Causal View of Entity Bias in (Large) Language Models” In arXiv preprint arXiv:2305.14695, 2023
  96. “Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences” In arXiv preprint arXiv:2401.10529, 2024
  97. “Chain-of-thought prompting elicits reasoning in large language models” In Advances in Neural Information Processing Systems 35, 2022, pp. 24824–24837
  98. “Probing for correlations of causal facts: Large language models and causality”, 2022
  99. Sewall Wright “The method of path coefficients” In The annals of mathematical statistics 5.3 JSTOR, 1934, pp. 161–215
  100. “On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities” In arXiv preprint arXiv:2402.10340, 2024
  101. “Interpretability at scale: Identifying causal mechanisms in alpaca” In arXiv preprint arXiv:2305.08809, 2023
  102. “Large language models can be good privacy protection learners” In arXiv preprint arXiv:2310.02469, 2023
  103. Yuxi Xie, Guanzhen Li and Min-Yen Kan “ECHo: Event Causality Inference via Human-centric Reasoning” In arXiv preprint arXiv:2305.14740, 2023
  104. “A review of dataset and labeling methods for causality extraction” In Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 1519–1531
  105. Jie Yang, Soyeon Caren Han and Josiah Poon “A survey on extraction of causal relations from natural language text” In Knowledge and Information Systems 64.5 Springer, 2022, pp. 1161–1186
  106. “mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration” In arXiv preprint arXiv:2311.04257, 2023
  107. “IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions” In arXiv preprint arXiv:2305.14010, 2023
  108. Alessio Zanga, Elif Ozkirimli and Fabio Stella “A survey on causal discovery: theory and practice” In International Journal of Approximate Reasoning 151 Elsevier, 2022, pp. 101–129
  109. “Causal parrots: Large language models may talk causality but are not causal” In arXiv preprint arXiv:2308.13067, 2023
  110. “A survey of causal inference frameworks” In arXiv preprint arXiv:2209.00869, 2022
  111. “Pangua: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation” In arXiv preprint arXiv:2104.12369, 2021
  112. “Macer: Attack-free and scalable robust training via maximizing certified radius” In arXiv preprint arXiv:2001.02378, 2020
  113. “Towards Causal Foundation Model: on Duality between Causal Inference and Attention” In arXiv preprint arXiv:2310.00809, 2023
  114. “Rock: Causal inference principles for reasoning about commonsense causality” In International Conference on Machine Learning, 2022, pp. 26750–26771 PMLR
  115. “Causal Reasoning of Entities and Events in Procedural Texts” In arXiv preprint arXiv:2301.10896, 2023
  116. “Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment” In arXiv preprint arXiv:2305.13669, 2023
  117. “Llavar: Enhanced visual instruction tuning for text-rich image understanding” In arXiv preprint arXiv:2306.17107, 2023
  118. “Certified robustness against natural language attacks by causal intervention” In International Conference on Machine Learning, 2022, pp. 26958–26970 PMLR
  119. “Explainability for large language models: A survey” In ACM Transactions on Intelligent Systems and Technology ACM New York, NY, 2023
  120. “Competeai: Understanding the competition behaviors in large language model-based agents” In arXiv preprint arXiv:2310.17512, 2023
  121. “Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models” In arXiv preprint arXiv:2312.06685, 2023
  122. Wei Zhao, Zhe Li and Jun Sun “Causality Analysis for Evaluating the Security of Large Language Models” In arXiv preprint arXiv:2312.07876, 2023
  123. “Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference” In arXiv preprint arXiv:2306.10790, 2023
  124. “Causal-debias: Unifying debiasing in pretrained language models and fine-tuning via causal invariant learning” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4227–4241
  125. Yuhang Zhou, Suraj Maharjan and Beiye Liu “Scalable prompt generation for semi-supervised learning with language models” In arXiv preprint arXiv:2302.09236, 2023
  126. “Explore Spurious Correlations at the Concept Level in Language Models for Text Classification” In arXiv preprint arXiv:2311.08648, 2023
  127. “Minigpt-4: Enhancing vision-language understanding with advanced large language models” In arXiv preprint arXiv:2304.10592, 2023
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Xiaoyu Liu (138 papers)
  2. Paiheng Xu (14 papers)
  3. Junda Wu (35 papers)
  4. Jiaxin Yuan (8 papers)
  5. Yifan Yang (578 papers)
  6. Yuhang Zhou (52 papers)
  7. Fuxiao Liu (17 papers)
  8. Tianrui Guan (29 papers)
  9. Haoliang Wang (16 papers)
  10. Tong Yu (119 papers)
  11. Julian McAuley (238 papers)
  12. Wei Ai (48 papers)
  13. Furong Huang (150 papers)
Citations (29)

Summary

LLMs and Causal Inference: Bridging the Gap

Introduction to LLMs

Recent advancements in LLMs have significantly pushed the boundaries of what was once considered achievable in the field of NLP and beyond. With each iteration, these models have grown not only in size but also in their ability to understand, generate, and interact with human language in ways that are increasingly nuanced and intelligent. This capacity for nuanced understanding and generation underpins their versatility across a range of applications, from simple query responses to complex problem-solving tasks. The evolution of LLMs into multi-modal domains, integrating visual and textual information, further amplifies their applicability and potential impact across diverse sectors.

Causal Inference: A Primer

Causal inference provides a framework to understand the underlying mechanisms that drive observed patterns in data. Essential to this understanding are concepts such as treatment effects, causal graphs, and structural equations, which help in dissecting the complex interplay between variables. Through causal inference, researchers can estimate how changes in one variable lead to changes in another, offering insights crucial for decision-making in fields ranging from medicine to economics.

The Synergy Between LLMs and Causal Inference

The intersection of LLMs with causal inference emerges as a fertile ground for addressing some of the intrinsic challenges faced by LLMs while also extending the methodologies of causal analysis. This symbiotic relationship is evident in several areas:

Enhancing LLM Reasoning Capabilities

Research indicates that integrating causal inference methodologies with LLMs can significantly improve their reasoning capabilities. This is particularly true for tasks that require an understanding of cause-and-effect relationships. Techniques like causal discovery and treatment effect estimation have been pivotal in enabling LLMs to navigate through complex reasoning tasks more effectively.

Addressing Fairness and Bias in LLMs

Causal inference methods offer robust frameworks for identifying and mitigating biases inherent in LLMs. By understanding the causal pathways that lead to biased outcomes, researchers can apply interventions to ensure fairer and more equitable model performances across diverse demographic groups.

Improving Safety and Explainability

LLMs face challenges related to safety and hallucination, where the models generate inaccurate or even harmful content. Causal inference provides tools for making LLMs safer by understanding the root causes of such behaviors. Similarly, causal methods enhance the explainability of LLMs by delineating the causal chains that lead to a particular output, making the models' decisions more transparent and interpretable.

Extending Causal Inference Through LLMs

On the flip side, LLMs have the potential to push the boundaries of causal inference. By serving as vast repositories of human knowledge, LLMs can assist in relaxing some of the stringent assumptions typically required in causal analysis, such as the stable unit treatment value assumption or the ignorability assumption. Furthermore, LLMs can aid in the discovery of causal relationships and the generation of counterfactual data, thus addressing some of the data scarcity and quality issues inherent in causal studies.

Future Directions and Conclusion

The integration of LLMs with causal inference methods holds promise for both fields. For LLMs, causal reasoning capabilities can be further honed to enhance their applicability in complex, real-world scenarios. Concurrently, causal inference stands to benefit from the vast knowledge encoded within LLMs, potentially revolutionizing the way causal relationships are discovered and analyzed. As this interdisciplinary field continues to evolve, it may pave the way toward more intelligent, fair, and reliable AI systems, significantly impacting various domains of human endeavor.