To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese
Abstract: Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages. This study addresses a question about ellipsis -- what can explain the native speakers' ellipsis decisions? -- motivated by the interest in human discourse processing and writing assistance for this choice. To this end, we first collect large-scale human annotations of whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese, a prototypical pro-drop language. The data indicate that native speakers overall share common criteria for such judgments and further clarify their quantitative characteristics, e.g., the distribution of related linguistic factors in the balanced corpus. Furthermore, the performance of the LLM-based argument ellipsis judgment model is examined, and the gap between the systems' prediction and human judgments in specific linguistic aspects is revealed. We hope our fundamental resource encourages further studies on natural human ellipsis judgment.
- Emily M Bender and Alexander Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198. Association for Computational Linguistics.
- Maria Nella Carminati. 2005. Processing reflexes of the feature hierarchy (person > number > gender) and implications for linguistic theory. Lingua, 115(3):259–285.
- Jeroen van Craenenbroeck and Tanja Temmerman. 2019. The Oxford Handbook of Ellipsis. Oxford University Press.
- Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning. arXiv, abs/2302.12313.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics and information structure. Universität Potsdam.
- Michael C Frank and Noah D Goodman. 2012. Predicting pragmatic reasoning in language games. Science, 336(6084):998.
- Topicalization in language models: A case study on Japanese. In Proceedings of the 29th International Conference on Computational Linguistics, pages 851–862. International Committee on Computational Linguistics.
- How efficiency shapes human language. Trends Cogn. Sci., 23(5):389–407.
- Centering: A framework for modeling the local coherence of discourse. Comput. Linguist., 21(2):203–225.
- Universals of word order reflect optimization of grammars for efficient communication. Proc. Natl. Acad. Sci. U. S. A., 117(5):2347–2353.
- A systematic assessment of syntactic generalization in neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1725–1744, Online. Association for Computational Linguistics.
- Jennifer Hu and Roger Levy. 2023. Prompting is not a substitute for probability measurements in large language models. arXiv, abs/2305.13264.
- Ramon Ferrer i Cancho and Ricard V Solé. 2003. Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences, 100(3):788–791.
- Zero-anaphora resolution by learning rich syntactic pattern features. ACM Transactions on Asian Language Information Processing, 6(4).
- Annotating a Japanese text corpus with predicate-argument and coreference relations. In Proceedings of the Linguistic Annotation Workshop, pages 132–139. Association for Computational Linguistics.
- Speakers optimize information density through syntactic reduction. In Advances in Neural Information Processing Systems, volume 19. MIT Press.
- Evaluating gpt-4 and chatgpt on japanese medical licensing examinations. arXiv, abs/2303.18027.
- Nihongo Kijutsu Bunpō Kenkyūkai. 2008. Gendai nihongo bunpō vol. 5 [toritate, topic].
- Compression and communication in the cultural evolution of linguistic structure. Cognition, 141:87–102.
- Pseudo zero pronoun resolution improves zero anaphora resolution. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3790–3806. Association for Computational Linguistics.
- Discourse probing of pretrained language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3849–3864. Association for Computational Linguistics.
- K Krippendorff. 2004. Content analysis: An introduction to its methodology. Sage publications.
- Hanjung Lee. 2006. Parallel optimization in case systems: Evidence from case ellipsis in korean. J. East Asian Ling., 15(1):69–96.
- Balanced corpus of contemporary written japanese. Language resources and evaluation, 48(2):345–371.
- Revisiting the Uniform Information Density hypothesis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 963–980. Association for Computational Linguistics.
- Jason Merchant et al. 2001. The syntax of silence: Sluicing, islands, and the theory of ellipsis, volume 1. Oxford University Press on Demand.
- James A. Michaelov and Benjamin K. Bergen. 2022. Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns? In Proceedings of the 29th International Conference on Computational Linguistics, pages 1–14, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- OpenAI. 2023. GPT-4 technical report.
- The communicative function of ambiguity in language. Cognition, 122(3):280–291.
- Massimo Poesio. 2010. Computational models of anaphora resolution: A survey.
- Winogrande: an adversarial winograd schema challenge at scale. Commun. ACM, 64(9):99–106.
- Ryohei Sasano and Sadao Kurohashi. 2011. A discriminative approach to Japanese zero anaphora resolution with large-scale lexicalized case frames. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 758–766. Asian Federation of Natural Language Processing.
- The role of UID for the usage of verb phrase ellipsis: Psycholinguistic evidence from length and context effects. Front. Psychol., 12:661087.
- Sebastian Schuster and Tal Linzen. 2022. When a sentence does not introduce a discourse entity, transformer-based models still sometimes refer to it. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 969–982. Association for Computational Linguistics.
- Tomohide Shibata and Sadao Kurohashi. 2018. Entity-centric joint modeling of Japanese coreference resolution and predicate argument structure analysis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 579–589. Association for Computational Linguistics.
- Language model acceptability judgements are not always robust to context. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6043–6063, Toronto, Canada. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. arXiv, abs/2307.09288.
- Natsuko Tsujimura. 2013. An introduction to Japanese linguistics. John Wiley & Sons.
- Predicting reference: What do language models learn about discourse models? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 977–982. Association for Computational Linguistics.
- Japanese discourse and the process of centering. Computational Linguistics, 20(2):193–231.
- Translating pro-drop languages with reconstruction models. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4937–4945.
- BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8:377–392.
- Numeral systems across languages support efficient communication: From approximate numerosity to recursion. Open Mind (Camb), 4:57–70.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.