Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM4Causal: Democratized Causal Tools for Everyone via Large Language Model (2312.17122v4)

Published 28 Dec 2023 in cs.CL, cs.AI, and stat.ML

Abstract: LLMs have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. By conducting end-to-end evaluations and two ablation studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers, which significantly outperforms the baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data. arXiv preprint arXiv:2306.16902 (2023).
  2. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  3. Bibhas Chakraborty and Susan A Murphy. 2014. Dynamic treatment regimes. Annual review of statistics and its application 1 (2014), 447–464.
  4. Causalml: Python package for causal machine learning. arXiv preprint arXiv:2002.11631 (2020).
  5. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  7. Template-based named entity recognition using BART. arXiv preprint arXiv:2106.01760 (2021).
  8. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics 10 (2022), 257–273.
  9. Review of causal discovery methods based on graphical models. Frontiers in genetics 10 (2019), 524.
  10. Learned Causal Method Prediction. arXiv preprint arXiv:2311.03989 (2023).
  11. BERTese: Learning to speak to BERT. arXiv preprint arXiv:2103.05327 (2021).
  12. Raymond Hicks and Dustin Tingley. 2011. Causal mediation analysis. The Stata Journal 11, 4 (2011), 605–619.
  13. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  15. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 103 (2023), 102274.
  16. Causal reasoning and large language models: Opening a new frontier for causality. arXiv preprint arXiv:2305.00050 (2023).
  17. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566 (2021).
  18. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
  19. Andrew Li and Peter Beek. 2018. Bayesian network structure learning with side constraints. In International conference on probabilistic graphical models. PMLR, 225–236.
  20. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
  21. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647 (2023).
  22. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  23. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  24. Can large language models build causal graphs? arXiv preprint arXiv:2303.05279 (2023).
  25. OpenAI. 2022. OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt
  26. OpenAI. 2023a. ChatGPT plugins. https://openai.com/blog/chatgpt-plugins
  27. OpenAI. 2023b. ChatGPT plugins. https://openai.com/blog/chatgpt-plugins#code-interpreter
  28. OpenAI. 2023c. Function calling and other API updates. https://openai.com/blog/function-calling-and-other-api-updates
  29. EconML: A Machine Learning Library for Estimating Heterogeneous Treatment Effects. In 33rd Conference on Neural Information Processing Systems. 6.
  30. Are NLP models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191 (2021).
  31. Tool learning with foundation models. arXiv preprint arXiv:2304.08354 (2023).
  32. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023).
  33. Cross-domain reasoning via template filling. arXiv preprint arXiv:2111.00539 (2021).
  34. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
  35. Timo Schick and Hinrich Schütze. 2020. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020).
  36. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7, Oct (2006), 2003–2030.
  37. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020).
  38. CausalDM. https://causaldm.github.io/Causal-Decision-Making/.
  39. Causation, prediction, and search. MIT press.
  40. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
  41. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  42. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  43. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752 (2023).
  44. A survey on causal inference. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 5 (2021), 1–46.
  45. DAG-GNN: DAG Structure Learning with Graph Neural Networks. arXiv preprint arXiv:1904.10098 (2019).
  46. A Survey on Causal Reinforcement Learning. arXiv:2302.05209 [cs.AI]
  47. Understanding causality with large language models: Feasibility and opportunities. arXiv preprint arXiv:2304.05524 (2023).
  48. Jiawei Zhang. 2023. Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT. arXiv preprint arXiv:2304.11116 (2023).
  49. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
  50. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  51. DAGs with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems. 9472–9483.
  52. Causal-learn: Causal discovery in python. arXiv preprint arXiv:2307.16405 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Haitao Jiang (26 papers)
  2. Lin Ge (19 papers)
  3. Yuhe Gao (8 papers)
  4. Jianian Wang (3 papers)
  5. Rui Song (130 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com