Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis (2402.14484v2)

Published 22 Feb 2024 in cs.CL

Abstract: Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that extends beyond general English datasets, including domain-specific and non-English datasets. We also provide an evaluation framework to ensure fair comparisons between ChatGPT and previous approaches. Finally, our analysis outlines the limitations and future challenges in employing ChatGPT for causal text mining. Specifically, our analysis reveals that ChatGPT serves as a good starting point for various datasets. However, when equipped with a sufficient amount of training data, previous models still surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency to falsely recognize non-causal sequences as causal sequences. These issues become even more pronounced with advanced versions of the model, such as GPT-4. In addition, we highlight the constraints of ChatGPT in handling complex causality types, including both intra/inter-sentential and implicit causality. The model also faces challenges with effectively leveraging in-context learning and domain adaptation. We release our code to support further research and development in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Abbas Akkasi and Mari-Francine Moens. 2021. Causal relationship extraction from biomedical text using deep neural models: A comprehensive survey. Journal of Biomedical Informatics, 119:103820.
  2. PaLM 2 Technical Report.
  3. Dogu Araci. 2019. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models.
  4. Tommaso Caselli and Piek Vossen. 2016. The Storyline Annotation and Representation Scheme (StaR): A Proposal. In Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016), pages 67–72, Austin, Texas. Association for Computational Linguistics.
  5. Tommaso Caselli and Piek Vossen. 2017. The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction. In Proceedings of the Events and Stories in the News Workshop, pages 77–86, Vancouver, Canada. Association for Computational Linguistics.
  6. Domino at FinCausal 2020, Task 1 and 2: Causal Extraction System. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pages 90–94, Barcelona, Spain (Online). COLING.
  7. ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. To Test Machine Comprehension, Start by Defining Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7839–7859, Online. Association for Computational Linguistics.
  10. The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations. In Proceedings of the 11th Linguistic Annotation Workshop, pages 95–104, Valencia, Spain. Association for Computational Linguistics.
  11. Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond. Transactions of the Association for Computational Linguistics, 10:1138–1158.
  12. Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation.
  13. Roxana Girju and Dan Moldovan. 2002. Text mining for causal relations. In Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS, pages 360–364.
  14. Sarthak Gupta. 2020. FiNLP at FinCausal 2020 Task 1: Mixture of BERTs for Causal Sentence Identification in Financial Texts. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pages 74–79, Barcelona, Spain (Online). COLING.
  15. Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5003–5009. International Joint Conferences on Artificial Intelligence Organization.
  16. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. In The Eleventh International Conference on Learning Representations.
  17. Causenet: Towards a causality graph extracted from the web. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 3023–3030, New York, NY, USA. Association for Computing Machinery.
  18. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 33–38, Uppsala, Sweden. Association for Computational Linguistics.
  19. Christopher Hidey and Kathy McKeown. 2016. Identifying Causal Relations Using Parallel Wikipedia Articles. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1424–1433, Berlin, Germany. Association for Computational Linguistics.
  20. Kiyoshi Izumi and Hiroki Sakaji. 2019. Economic Causal-Chain Search using Text Mining Technology. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing, pages 61–65, Macao, China.
  21. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality.
  22. Determining Sentences Containing Causal Relations in Financial Text Using BERT and GAT (in Japanese). In Proceedings of the Twenty-nineth Annual Meeting of the Association for Natural Language, pages 2709–2713. The Association for Natural Language Processing.
  23. Guided generation of cause and effect. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI’20.
  24. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  25. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
  26. The Financial Document Causality Detection Shared Task (FinCausal 2020). In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pages 23–32, Barcelona, Spain (Online). COLING.
  27. The Financial Causality Extraction Shared Task (FinCausal 2022). In Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022, pages 105–107, Marseille, France. European Language Resources Association.
  28. The Financial Document Causality Detection Shared Task (FinCausal 2021). In Proceedings of the 3rd Financial Narrative Processing Workshop, pages 58–60, Lancaster, United Kingdom. Association for Computational Linguistics.
  29. On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 228–238, Dublin, Ireland. Association for Computational Linguistics.
  30. Annotating causality in the TempEval-3 corpus. In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL), pages 10–19, Gothenburg, Sweden. Association for Computational Linguistics.
  31. Paramita Mirza and Sara Tonelli. 2014. An Analysis of Causality between Events and its Relation to Temporal Information. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2097–2106, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
  32. OpenAI. 2023. GPT-4 Technical Report.
  33. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  34. Judea Pearl. 2009. Causality. Cambridge University Press.
  35. True Few-Shot Learning with Language Models. In Advances in Neural Information Processing Systems, volume 34, pages 11054–11070. Curran Associates, Inc.
  36. Penn discourse treebank version 3.0.
  37. Is ChatGPT a general-purpose natural language processing task solver?
  38. Kira Radinsky and Eric Horvitz. 2013. Mining the web to predict future events. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, page 255–264, New York, NY, USA. Association for Computing Machinery.
  39. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  40. Discovery of rare causal knowledge from financial statement summaries. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–7.
  41. Shubhra Kanti Karmaker Santu and Dongji Feng. 2023. TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks.
  42. Constructing and analyzing domain-specific language model for financial text mining. Information Processing & Management, 60(2):103194.
  43. The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2298–2310, Marseille, France. European Language Resources Association.
  44. UniCausal: Unified Benchmark and Repository for Causal Text Mining. In Big Data Analytics and Knowledge Discovery, Lecture Notes in Computer Science, pages 248–262. Springer.
  45. Llama 2: Open Foundation and Fine-Tuned Chat Models.
  46. Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis.
  47. Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study.
  48. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
  49. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.
  50. A survey on extraction of causal relations from natural language text. Knowledge and Information Systems, 64(5):1161–1186.
  51. Sentiment Analysis in the Era of Large Language Models: A Reality Check.

Summary

We haven't generated a summary for this paper yet.