Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection (2405.09279v1)

Published 15 May 2024 in cs.CL and cs.AI

Abstract: Despite the recent ubiquity of LLMs and their high zero-shot prompted performance across a wide range of tasks, it is still not known how well they perform on tasks which require processing of potentially idiomatic language. In particular, how well do such models perform in comparison to encoder-only models fine-tuned specifically for idiomaticity tasks? In this work, we attempt to answer this question by looking at the performance of a range of LLMs (both local and software-as-a-service models) on three idiomaticity datasets: SemEval 2022 Task 2a, FLUTE, and MAGPIE. Overall, we find that whilst these models do give competitive performance, they do not match the results of fine-tuned task-specific models, even at the largest scales (e.g. for GPT-4). Nevertheless, we do see consistent performance improvements across model scale. Additionally, we investigate prompting approaches to improve performance, and discuss the practicalities of using LLMs for these tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Phi-2: The surprising power of small language models.
  2. Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5. ArXiv:2210.17301 [cs].
  3. Construction Artifacts in Metaphor Identification Datasets. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6581–6590, Singapore. Association for Computational Linguistics.
  4. HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection. ArXiv:2204.06145 [cs].
  5. Scaling instruction-finetuned language models.
  6. A Study on the Effectiveness of Large Language Models for Translation with Markup. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 148–159, Macau SAR, China. Asia-Pacific Association for Machine Translation.
  7. Llm.int8(): 8-bit matrix multiplication for transformers at scale.
  8. Gptq: Accurate post-training quantization for generative pre-trained transformers.
  9. Probing for idiomaticity in vector space models. In Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics, pages 3551–3564. Association for Computational Linguistics (ACL).
  10. Google Gemini Team. 2023. Gemini: A family of highly capable multimodal models.
  11. Gregori Gerganov. 2024. Llama.cpp: Inference of meta’s llama model (and others) in pure c/c++.
  12. Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE. ArXiv:2210.16407 [cs].
  13. Mistral 7b.
  14. TransLLaMa: LLM-based Simultaneous Translation System.
  15. Textbooks Are All You Need II: phi-1.5 technical report.
  16. AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models. arXiv preprint arXiv:2109.04413.
  17. How well do embedding models capture non-compositionality? a view from multiword expressions. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 27–34.
  18. OpenAI. 2023. GPT-4 Technical Report.
  19. Training language models to follow instructions with human feedback.
  20. Direct preference optimization: Your language model is secretly a reward model.
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  22. A Report on the FigLang 2022 Shared Task on Understanding Figurative Language. In Proceedings of the 3rd Workshop on Figurative Language Processing (FLP), pages 178–183, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  23. In-context impersonation reveals large language models’ strengths and biases. Advances in Neural Information Processing Systems, 36.
  24. Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057.
  25. SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 107–121, Seattle, United States. Association for Computational Linguistics.
  26. Llama 2: Open Foundation and Fine-Tuned Chat Models.
  27. Huggingface’s transformers: State-of-the-art natural language processing.
  28. A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models.
  29. Ziheng Zeng and Suma Bhat. 2021. Idiomatic Expression Identification using Semantic Compatibility. Transactions of the Association for Computational Linguistics, 9:1546–1562.
  30. Instruction tuning for large language models: A survey.
  31. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  32. e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
  33. FLUTE: Figurative Language Understanding through Textual Explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7139–7159.
  34. The commitmentbank: Investigating projection in naturally occurring discourse.
  35. MAGPIE: A large corpus of potentially idiomatic expressions. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 279–287. European Language Resources Association.
  36. IMPLI: Investigating NLI Models’ Performance on Figurative Language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5375–5388, Dublin, Ireland. Association for Computational Linguistics.
  37. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets