Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey (2403.09606v1)
Abstract: Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of NLP models by capturing causal relationships among variables. The emergence of generative LLMs has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs' reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.
- “Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning” In arXiv preprint arXiv:2312.06820, 2023
- “GPT-4 Technical Report” In arXiv preprint arXiv:2303.08774, 2023
- “Deep Learning of a Pre-trained Language Model’s Joke Classifier Using GPT-2” In Journal of Hunan University Natural Sciences 48.8, 2021
- Alessandro Antonucci, Gregorio Piqu’e and Marco Zaffalon “Zero-shot Causal Graph Extrapolation from Text via LLMs”, 2023 URL: https://api.semanticscholar.org/CorpusID:266521610
- “Large Language Models for Biomedical Causal Graph Construction” In arXiv preprint arXiv:2301.12473, 2023
- “Qwen technical report” In arXiv preprint arXiv:2309.16609, 2023
- “Causal Structure Learning Supervised by Large Language Model” In arXiv preprint arXiv:2311.11689, 2023
- “From query tools to causal architects: Harnessing large language models for advanced causal discovery from data” In arXiv preprint arXiv:2306.16902, 2023
- Rongzhou Bao, Jiayi Wang and Hai Zhao “Defending pre-trained language models from adversarial word substitutions without performance sacrifice” In arXiv preprint arXiv:2105.14553, 2021
- “Eliciting latent predictions from transformers with the tuned lens” In arXiv preprint arXiv:2303.08112, 2023
- “Relevance-based Infilling for Natural Language Counterfactuals” In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 88–98
- “LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?” In arXiv preprint arXiv:2309.13340, 2023
- “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
- “Sparks of artificial general intelligence: Early experiments with gpt-4” In arXiv preprint arXiv:2303.12712, 2023
- “Can prompt probe pretrained language models? understanding the invisible risks from a causal view” In arXiv preprint arXiv:2203.12258, 2022
- “End-to-end object detection with transformers” In European conference on computer vision, 2020, pp. 213–229 Springer
- “Learning a Structural Causal Model for Intuition Reasoning in Conversation” In arXiv preprint arXiv:2305.17727, 2023
- “Do models explain themselves? counterfactual simulatability of natural language explanations” In arXiv preprint arXiv:2307.08678, 2023
- “DISCO: distilling counterfactuals with large language models” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 5514–5528
- Robert Dale “GPT-3: What’s it good for?” In Natural Language Engineering 27.1 Cambridge University Press, 2021, pp. 113–118
- “Commonsense reasoning and commonsense knowledge in artificial intelligence” In Communications of the ACM 58.9 ACM New York, NY, USA, 2015, pp. 92–103
- “Word embeddings via causal inference: Gender bias reducing and semantic information preserving” In Proceedings of the AAAI Conference on Artificial Intelligence 36.11, 2022, pp. 11864–11872
- “Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond” In Transactions of the Association for Computational Linguistics 10 Cambridge, MA: MIT Press, 2022, pp. 1138–1158 DOI: 10.1162/tacl˙a˙00511
- “Causal-structure Driven Augmentations for Text OOD Generalization” In arXiv preprint arXiv:2310.12803, 2023
- “Fair-Prism: Evaluating fairness-related harms in text generation” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2023
- “GPT-3: Its nature, scope, limits, and consequences” In Minds and Machines 30 Springer, 2020, pp. 681–694
- “Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation” In arXiv preprint arXiv:2305.07375, 2023
- “Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals” In arXiv preprint arXiv:2310.00603, 2023
- “Finding alignments between interpretable causal variables and distributed neural representations” In arXiv preprint arXiv:2303.02536, 2023
- “LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation” In arXiv preprint arXiv, 2024
- “The false promise of imitating proprietary llms” In arXiv preprint arXiv:2305.15717, 2023
- “A survey of methods for explaining black box models” In ACM computing surveys (CSUR) 51.5 ACM New York, NY, USA, 2018, pp. 1–42
- “Finding Neurons in a Haystack: Case Studies with Sparse Probing” In arXiv preprint arXiv:2305.01610, 2023
- “Training compute-optimal large language models” In arXiv preprint arXiv:2203.15556, 2022
- “Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models” In arXiv preprint arXiv:2310.14491, 2023
- “CLOMO: Counterfactual Logical Modification with Large Language Models” In arXiv preprint arXiv:2311.17438, 2023
- “Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures” In arXiv preprint arXiv:2311.08605, 2023
- “Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach” In arXiv preprint arXiv:2310.06680, 2023
- “Certified robustness to adversarial word substitutions” In arXiv preprint arXiv:1909.00986, 2019
- “Can Large Language Models Infer Causation from Correlation?” In arXiv preprint arXiv:2306.05836, 2023
- “Cladder: A benchmark to assess causal reasoning capabilities of language models” In arXiv preprint arXiv:2312.04350, 2023
- “Advancing the state of the art in open domain dialog systems through the alexa prize” In arXiv preprint arXiv:1812.10757, 2018
- “Can ChatGPT Understand Causal Language in Science Claims?” In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 2023, pp. 379–389
- “Causal reasoning and large language models: Opening a new frontier for causality” In arXiv preprint arXiv:2305.00050, 2023
- “Large Language Models are Temporal and Causal Reasoners for Video Question Answering” In arXiv preprint arXiv:2310.15747, 2023
- “Measurement bias and effect restoration in causal inference” In Biometrika 101.2 Oxford University Press, 2014, pp. 423–437
- “Relation-Oriented: Toward Knowledge-Aligned Causal AI” In arXiv preprint arXiv:2307.16387, 2023
- “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models” In arXiv preprint arXiv:2301.12597, 2023
- “Image Content Generation with Causal Reasoning” In arXiv preprint arXiv:2312.07132, 2023
- “Large Language Models as Counterfactual Generator: Strengths and Weaknesses” In arXiv preprint arXiv:2305.14791, 2023
- “Monkey: Image resolution and text label are important things for large multi-modal models” In arXiv preprint arXiv:2311.06607, 2023
- “Towards understanding in-context learning with contrastive demonstrations and saliency maps” In arXiv preprint arXiv:2307.05052, 2023
- “Holistic evaluation of language models” In arXiv preprint arXiv:2211.09110, 2022
- Zachary C Lipton “The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.” In Queue 16.3 ACM New York, NY, USA, 2018, pp. 31–57
- “Aligning Large Multi-Modal Model with Robust Instruction Tuning” In arXiv preprint arXiv:2306.14565, 2023
- “HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V (ision), LLaVA-1.5, and Other Multi-modality Models” In arXiv preprint arXiv:2310.14566, 2023
- “MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning” In arXiv preprint arXiv:2311.10774, 2023
- “Visual news: Benchmark and challenges in news image captioning” In arXiv preprint arXiv:2010.03743, 2020
- “Visual instruction tuning” In arXiv preprint arXiv:2304.08485, 2023
- “The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code” In arXiv preprint arXiv:2305.19213, 2023
- “Can large language models build causal graphs?” In arXiv preprint arXiv:2303.05279, 2023
- “Neuro-Symbolic Procedural Planning with Commonsense Prompting” In arXiv preprint arXiv:2206.02928, 2022
- “Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models” In arXiv preprint arXiv:2306.05424, 2023
- Aman Madaan, Katherine Hermann and Amir Yazdanbakhsh “What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study” In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 1448–1535
- “CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation” In arXiv preprint arXiv:2306.00374, 2023
- “An overview of bard: an early experiment with generative ai” In AI. Google Static Documents 2, 2023
- “GPTEval: A survey on assessments of ChatGPT and GPT-4” In arXiv preprint arXiv:2308.12488, 2023
- Nicholas Meade, Elinor Poole-Dayan and Siva Reddy “An empirical survey of the effectiveness of debiasing techniques for pre-trained language models” In arXiv preprint arXiv:2110.08527, 2021
- “Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks” In arXiv preprint arXiv:2107.07610, 2021
- Xin Miao, Yongqi Li and Tieyun Qian “Generating Commonsense Counterfactuals for Stable Relation Extraction” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 5654–5668
- “Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer” In arXiv preprint arXiv:2311.07191, 2023
- “MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks” In arXiv preprint arXiv:2310.19677, 2023
- “Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning” In arXiv preprint arXiv:2305.12295, 2023
- “Answering Causal Questions with Augmented LLMs”, 2023
- Judea Pearl “Causality” Cambridge university press, 2009
- Judea Pearl “Graphical models for probabilistic and causal reasoning” In Quantified representation of uncertainty and imprecision Springer, 1998, pp. 367–389
- “Instruction tuning with gpt-4” In arXiv preprint arXiv:2304.03277, 2023
- “Improving language understanding by generative pre-training” OpenAI, 2018
- “CRAB: Assessing the Strength of Causal Relationships Between Real-world Events” In arXiv preprint arXiv:2311.04284, 2023
- Donald B Rubin “Estimating causal effects of treatments in randomized and nonrandomized studies.” In Journal of educational Psychology 66.5 American Psychological Association, 1974, pp. 688
- “Exploiting cloze questions for few shot text classification and natural language inference” In arXiv preprint arXiv:2001.07676, 2020
- Gabriel Stanovsky, Noah A. Smith and Luke Zettlemoyer “Evaluating Gender Bias in Machine Translation” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Florence, Italy: Association for Computational Linguistics, 2019, pp. 1679–1684 DOI: 10.18653/v1/P19-1164
- Alessandro Stolfo, Yonatan Belinkov and Mrinmaya Sachan “A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis”, 2023 arXiv:2305.15054 [cs.CL]
- Shane Storks, Qiaozi Gao and Joyce Y Chai “Commonsense reasoning for natural language understanding: A survey of benchmarks, resources, and approaches” In arXiv preprint arXiv:1904.01172, 2019, pp. 1–60
- “Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4950–4959
- “Link-context learning for multimodal llms” In arXiv preprint arXiv:2308.07891, 2023
- Juanhe TJ Tan “Causal Abstraction for Chain-of-Thought Reasoning in Arithmetic Word Problems” In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, 2023, pp. 155–168
- “Towards causalgpt: A multi-agent approach for faithful knowledge reasoning via promoting causal consistency in llms” In arXiv preprint arXiv:2308.11914, 2023
- “Alpaca: A strong, replicable instruction-following model” In Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html 3.6, 2023, pp. 7
- “Llama 2: Open foundation and fine-tuned chat models” In arXiv preprint arXiv:2307.09288, 2023
- Ruibo Tu, Chao Ma and Cheng Zhang “Causal-discovery performance of chatgpt in the context of neuropathic pain diagnosis” In arXiv preprint arXiv:2301.13819, 2023
- “Causal Inference Using LLM-Guided Discovery” In arXiv preprint arXiv:2310.15117, 2023
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- “Biasasker: Measuring the bias in conversational ai system” In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 515–527
- “A Causal View of Entity Bias in (Large) Language Models” In arXiv preprint arXiv:2305.14695, 2023
- “Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences” In arXiv preprint arXiv:2401.10529, 2024
- “Chain-of-thought prompting elicits reasoning in large language models” In Advances in Neural Information Processing Systems 35, 2022, pp. 24824–24837
- “Probing for correlations of causal facts: Large language models and causality”, 2022
- Sewall Wright “The method of path coefficients” In The annals of mathematical statistics 5.3 JSTOR, 1934, pp. 161–215
- “On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities” In arXiv preprint arXiv:2402.10340, 2024
- “Interpretability at scale: Identifying causal mechanisms in alpaca” In arXiv preprint arXiv:2305.08809, 2023
- “Large language models can be good privacy protection learners” In arXiv preprint arXiv:2310.02469, 2023
- Yuxi Xie, Guanzhen Li and Min-Yen Kan “ECHo: Event Causality Inference via Human-centric Reasoning” In arXiv preprint arXiv:2305.14740, 2023
- “A review of dataset and labeling methods for causality extraction” In Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 1519–1531
- Jie Yang, Soyeon Caren Han and Josiah Poon “A survey on extraction of causal relations from natural language text” In Knowledge and Information Systems 64.5 Springer, 2022, pp. 1161–1186
- “mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration” In arXiv preprint arXiv:2311.04257, 2023
- “IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions” In arXiv preprint arXiv:2305.14010, 2023
- Alessio Zanga, Elif Ozkirimli and Fabio Stella “A survey on causal discovery: theory and practice” In International Journal of Approximate Reasoning 151 Elsevier, 2022, pp. 101–129
- “Causal parrots: Large language models may talk causality but are not causal” In arXiv preprint arXiv:2308.13067, 2023
- “A survey of causal inference frameworks” In arXiv preprint arXiv:2209.00869, 2022
- “Pangua: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation” In arXiv preprint arXiv:2104.12369, 2021
- “Macer: Attack-free and scalable robust training via maximizing certified radius” In arXiv preprint arXiv:2001.02378, 2020
- “Towards Causal Foundation Model: on Duality between Causal Inference and Attention” In arXiv preprint arXiv:2310.00809, 2023
- “Rock: Causal inference principles for reasoning about commonsense causality” In International Conference on Machine Learning, 2022, pp. 26750–26771 PMLR
- “Causal Reasoning of Entities and Events in Procedural Texts” In arXiv preprint arXiv:2301.10896, 2023
- “Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment” In arXiv preprint arXiv:2305.13669, 2023
- “Llavar: Enhanced visual instruction tuning for text-rich image understanding” In arXiv preprint arXiv:2306.17107, 2023
- “Certified robustness against natural language attacks by causal intervention” In International Conference on Machine Learning, 2022, pp. 26958–26970 PMLR
- “Explainability for large language models: A survey” In ACM Transactions on Intelligent Systems and Technology ACM New York, NY, 2023
- “Competeai: Understanding the competition behaviors in large language model-based agents” In arXiv preprint arXiv:2310.17512, 2023
- “Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models” In arXiv preprint arXiv:2312.06685, 2023
- Wei Zhao, Zhe Li and Jun Sun “Causality Analysis for Evaluating the Security of Large Language Models” In arXiv preprint arXiv:2312.07876, 2023
- “Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference” In arXiv preprint arXiv:2306.10790, 2023
- “Causal-debias: Unifying debiasing in pretrained language models and fine-tuning via causal invariant learning” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4227–4241
- Yuhang Zhou, Suraj Maharjan and Beiye Liu “Scalable prompt generation for semi-supervised learning with language models” In arXiv preprint arXiv:2302.09236, 2023
- “Explore Spurious Correlations at the Concept Level in Language Models for Text Classification” In arXiv preprint arXiv:2311.08648, 2023
- “Minigpt-4: Enhancing vision-language understanding with advanced large language models” In arXiv preprint arXiv:2304.10592, 2023
- Xiaoyu Liu (138 papers)
- Paiheng Xu (14 papers)
- Junda Wu (35 papers)
- Jiaxin Yuan (8 papers)
- Yifan Yang (578 papers)
- Yuhang Zhou (52 papers)
- Fuxiao Liu (17 papers)
- Tianrui Guan (29 papers)
- Haoliang Wang (16 papers)
- Tong Yu (119 papers)
- Julian McAuley (238 papers)
- Wei Ai (48 papers)
- Furong Huang (150 papers)