Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning
Abstract: Event temporal reasoning aims at identifying the temporal relations between two or more events from narratives. However, knowledge conflicts arise when there is a mismatch between the actual temporal relations of events in the context and the prior knowledge or biases learned by the model. In this paper, we propose to detect knowledge-conflict examples in event temporal reasoning using bias indicators, which include event relation prior bias, tense bias, narrative bias, and dependency bias. We define conflict examples as those where event relations are opposite to biased or prior relations. To mitigate event-related knowledge conflicts, we introduce a Counterfactual Data Augmentation (CDA) based method that can be applied to both Pre-trained LLMs (PLMs) and LLMs either as additional training data or demonstrations for In-Context Learning. Experiments suggest both PLMs and LLMs suffer from knowledge conflicts in event temporal reasoning, and CDA has the potential for reducing hallucination and improving model performance.
- On the dangers of stochastic parrots: Can language models be too big? In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, pages 610–623. ACM.
- Timelines from text: Identification of syntactic temporal relations. In Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), September 17-19, 2007, Irvine, California, USA, pages 11–18. IEEE Computer Society.
- Semeval-2017 task 12: Clinical tempeval. In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017, pages 565–572. Association for Computational Linguistics.
- Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. CoRR, abs/2303.16421.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Mitigating reporting bias in semi-supervised temporal commonsense inference with probabilistic soft logic. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 10454–10462. AAAI Press.
- Classifying temporal relations between events. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pages 173–176.
- Chatgpt evaluation on sentence level relations: A focus on temporal, causal, and discourse relations. CoRR, abs/2304.14827.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 4067–4080. Association for Computational Linguistics.
- Paul R Cohen. 1995. Empirical methods for artificial intelligence, volume 139. MIT press Cambridge, MA.
- Generic temporal reasoning with differential analysis and explanation. CoRR, abs/2212.10467.
- Jonathan Gordon and Benjamin Van Durme. 2013. Reporting bias and knowledge acquisition. In Proceedings of the 2013 workshop on Automated knowledge base construction, AKBC@CIKM 13, San Francisco, California, USA, October 27-28, 2013, pages 25–30. ACM.
- A systematic study of bias amplification. CoRR, abs/2201.11706.
- ESTER: A machine reading comprehension dataset for reasoning about event semantic relations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 7543–7559. Association for Computational Linguistics.
- Unlearn dataset bias in natural language inference by fitting the residual. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 132–142, Hong Kong, China. Association for Computational Linguistics.
- Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Comput., 14(8):1771–1800.
- Exploiting asymmetry for synthetic training data generation: Synthie and the case of information extraction. CoRR, abs/2303.04132.
- Adversarial filters of dataset biases. In International Conference on Machine Learning, pages 1078–1088. PMLR.
- Large language models with controllable working memory. arXiv preprint arXiv:2211.05110.
- The future is not one-dimensional: Complex event schema induction by graph modeling for event prediction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5203–5215, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Constructing narrative event evolutionary graph for script event prediction. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4201–4207.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7052–7063, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Annie Louis and Ani Nenkova. 2012. A corpus of general and specific sentences from news. In LREC, volume 1818, page 10. Citeseer.
- TIMERS: Document-level temporal relation extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 524–533, Online. Association for Computational Linguistics.
- Tddiscourse: A dataset for discourse-level temporal ordering of events. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, SIGdial 2019, Stockholm, Sweden, September 11-13, 2019, pages 239–249. Association for Computational Linguistics.
- TORQUE: A reading comprehension dataset of temporal ordering questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 1158–1172. Association for Computational Linguistics.
- A multi-axis annotation scheme for event temporal relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 1318–1328. Association for Computational Linguistics.
- The timebank corpus. In Corpus linguistics, volume 2003, page 40. Lancaster, UK.
- Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106.
- Roger C Schank and Robert P Abelson. 1977. Scripts, plans, goals and understanding: An inquiry into human knowledge structures.
- Get your vitamin C! robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 624–643, Online. Association for Computational Linguistics.
- Vered Shwartz and Yejin Choi. 2020. Do neural language models overcome reporting bias? In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 6863–6870. International Committee on Computational Linguistics.
- Armand Stricker. 2021. Question answering in natural language: the special case of temporal expressions. In Proceedings of the Student Research Workshop Associated with RANLP 2021, pages 184–192, Online. INCOMA Ltd.
- Zhongxiang Sun. 2023. A short survey of viewing large language models in legal aspect. CoRR, abs/2303.09136.
- Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 1–9.
- A causal view of entity bias in (large) language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 15173–15184. Association for Computational Linguistics.
- Joint constrained learning for event-event relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 696–706. Association for Computational Linguistics.
- Extracting or guessing? improving faithfulness of event temporal relation extraction. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 541–553. Association for Computational Linguistics.
- : Visualizing and understanding commonsense reasoning capabilities of natural language models. IEEE Trans. Vis. Comput. Graph., 30(1):273–283.
- Should we rely on entity mentions for relation extraction? debiasing relation extraction with counterfactual analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 3071–3081. Association for Computational Linguistics.
- COLA: contextualized commonsense causal reasoning from the causal inference perspective. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 5253–5271. Association for Computational Linguistics.
- Subeventwriter: Iterative sub-event sequence generation with coherence controller. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 1590–1604. Association for Computational Linguistics.
- Yorick Wilks. 1975. A preferential, pattern-seeking, semantics for natural language inference. Artif. Intell., 6(1):53–74.
- Does your model classify entities reasonably? diagnosing and mitigating spurious correlations in entity typing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 8642–8658. Association for Computational Linguistics.
- Zero-shot temporal relation extraction with chatgpt. CoRR, abs/2304.05454.
- Event perception: a mind-brain perspective. Psychological bulletin, 133(2):273.
- Jeffrey M Zacks and Barbara Tversky. 2001. Event structure in perception and conception. Psychological bulletin, 127(1):3.
- Big bird: Transformers for longer sequences. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Aligning instruction tasks unlocks large language models as zero-shot relation extractors.
- Extracting temporal event relation with syntax-guided graph transformer. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 379–390. Association for Computational Linguistics.
- Effective distant supervision for temporal relation extraction. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 195–203, Kyiv, Ukraine. Association for Computational Linguistics.
- Temporal reasoning on implicit events from distant supervision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1361–1371, Online. Association for Computational Linguistics.
- RSGT: relational structure guided temporal relation extraction. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 2001–2010. International Committee on Computational Linguistics.
- Context-faithful prompting for large language models. arXiv preprint arXiv:2303.11315.
- Uncovering the temporal context for video question answering. Int. J. Comput. Vis., 124(3):409–421.
- Dimensions of situation model construction in narrative comprehension. Journal of experimental psychology: Learning, memory, and cognition, 21(2):386.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.