Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena (2403.06965v1)

Published 11 Mar 2024 in cs.CL

Abstract: Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact thatsneeze'' in this context causes movement cannot be explained. We form the hypothesis that this remains challenging even for state-of-the-art LLMs, for which we devise a test based on substituting the verb with a prototypical motion verb. To be able to perform this test at statistically significant scale, in the absence of adequate CxG corpora, we develop a novel pipeline of NLP-assisted collection of linguistically annotated text. We show how dependency parsing and GPT-3.5 can be used to significantly reduce annotation cost and thus enable the annotation of rare phenomena at scale. We then evaluate GPT, Gemini, Llama2 and Mistral models for their understanding of the CMC using the newly collected corpus. We find that all models struggle with understanding the motion component that the CMC adds to a sentence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. The pushshift reddit dataset. CoRR, abs/2001.08435.
  2. Giulia ML Bencini and Adele E Goldberg. 2000. The contribution of argument structure constructions to sentence meaning. Journal of Memory and Language, 43(4):640–651.
  3. Language models are few-shot learners.
  4. Noam Chomsky. 1993. Lectures on government and binding: The Pisa lectures. 9. Walter de Gruyter.
  5. William Croft. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford University Press on Demand.
  6. Universal Dependencies. Computational Linguistics, 47(2):255–308.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056.
  9. Adele E.. Goldberg. 1995. Constructions: A construction grammar approach to argument structure. University of Chicago Press.
  10. Adele Eva Goldberg. 1992. Argument structure constructions. University of California, Berkeley.
  11. Can gpt alleviate the burden of annotation? In Legal Knowledge and Information Systems, pages 157–166. IOS Press.
  12. Ole Magnus Holter and Basil Ell. 2023. Human-machine collaborative annotation: A case study with gpt-3. In Proceedings of the 4th Conference on Language, Data and Knowledge, pages 193–206.
  13. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
  14. Haerim Hwang and Hyunwoo Kim. 2023. Automatic analysis of constructional diversity as a predictor of efl students’ writing proficiency. Applied Linguistics, 44(1):127–147.
  15. Jena D. Hwang and Martha Palmer. 2015. Identification of caused motion construction. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 51–60, Denver, Colorado. Association for Computational Linguistics.
  16. Mistral 7b.
  17. Mixtral of experts.
  18. Clarin-emo: Training emotion recognition models using human annotation and chatgpt. In International Conference on Computational Science, pages 365–379. Springer.
  19. Kristopher Kyle and Hakyung Sung. 2023. An argument structure construction treebank. In Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), pages 51–62, Washington, D.C. Association for Computational Linguistics.
  20. Neural reality of argument structure constructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7410–7423, Dublin, Ireland. Association for Computational Linguistics.
  21. Kyle Mahowald. 2023. A discerning several thousand judgments: Gpt-3 rates the article+ adjective+ numeral+ noun construction. arXiv preprint arXiv:2301.12564.
  22. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue.
  23. Automated annotation with generative ai requires validation. arXiv preprint arXiv:2306.00176.
  24. Siyao Peng and Amir Zeldes. 2018. All roads lead to UD: Converting Stanford and Penn parses to English Universal Dependencies with multilayer annotations. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 167–177, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  25. Framenet ii: Extended theory and practice. Technical report, International Computer Science Institute.
  26. Jaromir Savelka and Kevin D Ashley. 2023. The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts. Frontiers in Artificial Intelligence, 6.
  27. Syntactic search by example. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 17–23, Online. Association for Computational Linguistics.
  28. CxGBERT: BERT meets construction grammar. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4020–4032, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  29. Gemini: A family of highly capable multimodal models.
  30. Copilots for Linguists: AI, Constructions, and Frames. Cambridge University Press.
  31. Llama 2: Open foundation and fine-tuned chat models.
  32. CxLM: A construction and context-aware language model. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6361–6369, Marseille, France. European Language Resources Association.
  33. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.
  34. The better your syntax, the better your semantics? probing pretrained language models for the English comparative correlative. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10859–10882, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  35. Assessing the potential of llm-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies. International Journal of Corpus Linguistics.
Citations (2)

Summary

  • The paper presents a novel hybrid human-LLM pipeline that combines dependency parsing, GPT-3.5 classification, and manual verification to efficiently annotate rare linguistic phenomena.
  • It evaluates multiple LLMs, including GPT, Gemini, Llama2, and Mistral, on their ability to interpret the Caused Motion Construction with significant performance gaps.
  • The findings underscore an efficient corpus construction method and highlight the need for improved semantic understanding in LLMs to process complex linguistic structures.

Hybrid Human-LLM Corpus Construction and Evaluation for Understanding Rare Linguistic Phenomena

Introduction

Rare linguistic phenomena often elude the grasp of LLMs, presenting a unique challenge to both computational linguistics and the development of more nuanced, understanding AI systems. This paper presents a methodological innovation for both constructing a corpus centered on a rare linguistic structure known as the Caused Motion Construction (CMC) and evaluating various LLMs' capability to comprehend this structure. Through a hybrid pipeline combining human linguistic expertise, NLP tools, and the advanced capabilities of GPT-3.5, this paper not only proposes a cost-efficient approach to annotating rare linguistic phenomena at scale but also critically assesses the current state-of-the-art LLMs against the backdrop of grammatical constructions that require a deeper semantic understanding.

Data Collection Methodology

The paper introduces a novel pipeline for data collection that significantly reduces the annotation burden typically associated with rare linguistic phenomena. This is particularly relevant for the CMC, which involves verbs that are conventionally intransitive, taking on a transitive role and implying motion or displacement as a result of the action.

Key to this methodology is an initial filtering phase using dependency parsing to identify potential CMC instances, followed by a refinement phase through GPT-3.5, which classifies these instances with an instructional prompt specifically tailored to identify the CMC. The dual-stage filtration considerably concentrates the density of CMC instances, and the final dataset comprises both manually verified instances and a larger, semi-automatically annotated corpus. The process involves intricate prompt design and engineering to optimize both the accuracy and the cost-efficiency of the LLM-assisted classification.

Evaluation of LLMs

The primary focus of the evaluation is to ascertain whether various LLMs, including GPT, Gemini, Llama2, and Mistral models, can accurately interpret the CMC, which lies at the intersection of syntax and semantics. A specialized evaluation setup presents sentences containing the CMC to these models and questions whether the direct object in each sentence is indeed moving, an implicit understanding necessary to correctly interpret CMC instances. The accuracy rates reported reveal substantial gaps in the models' understanding, highlighting a critical area for future work on model training and development.

Contributions and Future Work

This paper makes several significant contributions:

  • A hybrid human-LLM pipeline for the cost-efficient collection of rare linguistic phenomena.
  • The release of a uniquely compiled corpus, both manually and semi-automatically annotated, centered on the CMC.
  • An insightful evaluation of several state-of-the-art LLMs on their understanding of the CMC, providing a clear indicator of where these models stand in terms of interpreting complex linguistic constructions.

The methodology and findings underscore the inherent challenges and potential pathways for advancing the understanding capabilities of LLMs. Future research directions include extending this hybrid annotation and evaluation framework to other rare linguistic phenomena and exploring advancements in LLM architectures and training paradigms to enhance their grasp of complex linguistic constructions.

Concluding Remarks

This paper sheds light on the intricacies involved in handling and understanding rare linguistic phenomena by contemporary LLMs. Its methodology and findings contribute valuable insights to the computational linguistics community, providing a clear direction for future research aimed at enhancing the semantic comprehension of LLMs.

X Twitter Logo Streamline Icon: https://streamlinehq.com