Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks (2201.03521v1)
Abstract: Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, i.e. prediction of entailment, contradiction or neutral (ECN). The dataset contains entirely natural language utterances in Polish and gathers 2,432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative sample in regards to frequency of main verbs and other linguistic features (e.g. occurrence of internal negation). We found that transformer BERT-based models working on sentences obtained relatively good results ($\approx89\%$ F1 score). Even though better results were achieved using linguistic features ($\approx91\%$ F1 score), this model requires more human labour (humans in the loop) because features were prepared manually by expert linguists. BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity. Complex cases in the phenomenon - e.g. cases with entitlement (E) and non-factive verbs - remain an open issue for further research.
- Probing natural language inference models through semantic fragments. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8713–8721, 2020. doi:10.1609/aaai.v34i05.6397.
- Semantics-aware bert for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9628–9635, 2020.
- Jeff Speaks. Theories of Meaning. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2021 edition, 2021.
- Yan Huang. 14. types of inference: entailment, presupposition, and implicature. In Foundations of pragmatics, pages 397–422. De Gruyter Mouton, 2011.
- C Kiparsky P Kiparsky. Fact’in semantics. Semantics, 1(971):345–69, 1971.
- Lauri Karttunen. Some observations on factivity. Research on Language & Social Interaction, 4(1):55–69, 1971a.
- Narodowy Korpus Języka Polskiego (National Corpus of Polish Language). Wydawnictwo Naukowe PWN, Warsaw, Poland, 2012.
- Talmy Givón. The time-axis phenomenon. Language, pages 890–925, 1973.
- Dale E Elliott. Toward a grammar of exclamations. Foundations of language, 11(2):231–246, 1974.
- Joan B Hooper. On assertive predicates. In Syntax and Semantics volume 4, pages 91–124. Brill, 1975.
- Enrique B Delacruz. Factives and proposition level constructions in montague grammar. In Montague grammar, pages 177–199. Elsevier, 1976.
- Pragmatic presuppositions. In Proceedings of the Texas conference on per~ formatives, presuppositions, and implicatures. Arlington, VA: Center for Applied Linguistics, pages 135–148. ERIC, 1977.
- Lauri Karttunen. Presupposition: What went wrong? In Semantics and Linguistic Theory, volume 26, pages 705–731, 2016.
- Anastasia Giannakidou. Only, emotive factive verbs, and the dual nature of polarity dependency. Language, pages 575–603, 2006.
- Paul Egré. Question-embedding and factivity. Grazer philosophische studien, 77(1):85–125, 2008.
- David Beaver. 3: Have you noticed that your belly button lint color is related to the color of your clothing? In Presuppositions and discourse: Essays offered to Hans Kamp, pages 65–100. Brill, 2010.
- Savas L Tsohatzidis. How to forget that “know” is factive. Acta Analytica, 27(4):449–459, 2012.
- Factivity, belief and discourse. The art and craft of semantics: A festschrift for Irene Heim, 1:69–90, 2014.
- Itamar Kastner. Factivity mirrors interpretation: The selectional requirements of presuppositional verbs. Lingua, 164:156–188, 2015.
- Kajsa Djärv. Factive and assertive attitude reports. 2019.
- Márta Abrusán. Presupposition cancellation: explaining the ‘soft–hard’trigger distinction. Natural Language Semantics, 24(2):165–202, 2016.
- Judith Tonhauser. Prosodic cues to presupposition projection. In Semantics and Linguistic Theory, volume 26, pages 934–960, 2016.
- The best question: Explaining the projection behavior of factives. Discourse processes, 54(3):187–206, 2017.
- Prosodic effects on factive presupposition projection. Journal of Pragmatics, 169:61–85, 2020.
- Allan Hazlett. The myth of factive verbs. Philosophy and phenomenological research, 80(3):497–522, 2010.
- Allan Hazlett. Factive presupposition and the truth condition on knowledge. Acta Analytica, 27(4):461–478, 2012.
- John Turri. Mythology of the factive. Logos & Episteme, 2(1):141–150, 2011.
- Elçin Ölmezer Öztürk. A corpus-based study on ‘regret’as a factive verb and its complements. European Journal of Foreign Language Teaching, 2017.
- Christina H Dietz. Reasons and factive emotions. Philosophical Studies, 175(7):1681–1691, 2018.
- Roberta Colonna Dahlman. Did people in the middle ages know that the earth was flat? Acta Analytica, 31(2):139–152, 2016.
- Nora Grigore et al. Factive verbs and presuppositions for’regret’and’know’. Revista Română de Filosofie Analitică, 10(2):19–34, 2016.
- Deniz Özyıldız. Factivity and prosody in turkish attitude reports. UMass generals paper, 2017.
- Factivity and two types of embedded clauses in washo. In North-east linguistic society (nels), volume 47, pages 65–78, 2017.
- Marwan Jarrah. Factivity and subject extraction in jordanian arabic. Lingua, 219:106–126, 2019.
- Roberta Colonna Dahlman and Joost van de Weijer. Testing factivity in italian. experimental evidence for the hypothesis that italian sapere is ambiguous. Language Sciences, 72:93–103, 2019.
- Meaning and grammar: An introduction to semantics. MIT press, 2000.
- Herbert P Grice. Logic and conversation. In Speech acts, pages 41–58. Brill, 1975.
- Uli Sauerland. Scalar implicatures in complex sentences. Linguistics and philosophy, 27(3):367–391, 2004.
- Lauri Karttunen. Implicative verbs. Language, pages 340–358, 1971b.
- Stephen C Levinson. Pragmatics. 1983.
- Factbank: a corpus annotated with event factuality. Language resources and evaluation, 43(3):227–268, 2009.
- Are you sure that this happened? assessing the factuality degree of events in text. Computational linguistics, 38(2):261–299, 2012.
- The FRACAS Consortium. 1996.
- The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177–190. Springer, 2005.
- Recognizing textual entailment: Models and applications. Synthesis Lectures on Human Language Technologies, 6(4):1–220, 2013.
- How well do NLI models capture verb veridicality? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2230–2240, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1228. URL https://www.aclweb.org/anthology/D19-1228.
- Neural models of factuality. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 731–744, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi:10.18653/v1/N18-1067. URL https://www.aclweb.org/anthology/N18-1067.
- Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation, pages 64–67, 2010.
- Data quality from crowdsourcing: a study of annotation selection criteria. In Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing, pages 27–35, 2009.
- Evaluating BERT for natural language inference: A case study on the CommitmentBank. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6085–6090, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1630. URL https://www.aclweb.org/anthology/D19-1630.
- Nope: A corpus of naturally-occurring presuppositions in english. arXiv preprint arXiv:2109.06987, 2021.
- He thinks he knows better than the doctors: Bert for event factuality fails on pragmatics. Transactions of the Association for Computational Linguistics, 9:1081–1097, 2021.
- Trusting roberta over bert: Insights from checklisting the natural language inference task. arXiv preprint arXiv:2107.07229, 2021a.
- Lonli: An extensible framework for testing diverse logical reasoning capabilities for nli. arXiv preprint arXiv:2112.02333, 2021b.
- Exploring transitivity in neural nli models through veridicality. arXiv preprint arXiv:2101.10713, 2021.
- Adam Poliak. A survey on recognizing textual entailment as an NLP evaluation. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pages 92–109, Online, November 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.eval4nlp-1.10. URL https://aclanthology.org/2020.eval4nlp-1.10.
- Coreference: Annotation, Resolution and Evaluation in Polish. Walter de Gruyter GmbH & Co KG, 2014.
- B Partee. Topic, focus and quantification. Semantics and Linguistic Theory, pages 159–188, 2014.
- KLEJ: Comprehensive benchmark for polish language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1191–1201, Online, July 2020a. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.acl-main.111.
- Klej: Comprehensive benchmark for polish language understanding. arXiv preprint arXiv:2005.00630, 2020b.
- Word frequency list of american english. a a, 10343885:0–97, 2010.
- Are natural language inference models IMPPRESsive? Learning IMPlicature and PRESupposition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8690–8705, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.768. URL https://www.aclweb.org/anthology/2020.acl-main.768.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi:10.18653/v1/N18-1101. URL https://www.aclweb.org/anthology/N18-1101.
- Stephen C Levinson. Pragmatics. In International Encyclopedia of Social and Behavioral Sciences: Vol. 17, pages 11948–11954. Pergamon, 2001.
- Magdalena Danielewiczowa. Wiedza i niewiedza. Studium polskich czasowników epistemicznych, 2002.
- Annotation artifacts in natural language inference data. arXiv preprint arXiv:1803.02324, 2018.
- Unlearn dataset bias in natural language inference by fitting the residual. arXiv preprint arXiv:1908.10763, 2019.
- Distributionally robust language modeling. arXiv preprint arXiv:1909.02060, 2019.
- Evaluating compositionality in sentence embeddings. arXiv preprint arXiv:1802.04302, 2018.
- Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007, 2019.
- An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models. Transactions of the Association for Computational Linguistics, 8:621–633, 10 2020. ISSN 2307-387X. doi:10.1162/tacl_a_00335. URL https://doi.org/10.1162/tacl_a_00335.