Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It) (2402.17608v1)

Published 27 Feb 2024 in cs.CL

Abstract: In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.

Enhancing Encoder-Decoder Models with Linguistic Knowledge for Sentence Complexity Prediction

Introduction

Recent advancements in NLP (Natural Language Processing) have leveraged the potent capabilities of pre-trained Neural LLMs (NLMs) across a myriad of tasks. An intriguing area of paper focuses on the interplay between these models and the incorporation of linguistic knowledge to bolster their performance on specific tasks. This paper explores this domain by fine-tuning T5, a renowned Encoder-Decoder model, using an intermediate task aimed at predicting structural linguistic properties of sentences. The ultimate goal is to assess the impact of this linguistically enriched fine-tuning on the model's ability to predict sentence-level complexity. The methodology and experiments extend across both Italian and English datasets, utilizing mono- and multi-lingual T5 models of varying sizes, thereby offering insights into the scalability and adaptability of this approach across languages and data scenarios.

Experimental Framework

The crux of the experimental design hinges on a two-phase STILTs (supervised training on intermediate labeled-data tasks) approach. Initially, T5 models undergo fine-tuning on a set of intermediate support tasks that encapsulate various linguistic phenomena identified as potential correlates of sentence complexity. This procedure, grounded in multi-task learning, generates linguistically informed T5 models—denoted as LiT5—each snapshot representing a distinct phase of linguistic knowledge acquisition. Following this, the LiT5 models are further fine-tuned on the task of predicting sentence complexity to ascertain the effectiveness of the linguistic knowledge gained in the initial phase.

Data for the target task and intermediate tasks were meticulously curated from Italian and English Universal Dependency Treebanks and evaluated across both languages using monolingual and multilingual T5 model configurations. The intermediate tasks focused on predicting a subset of linguistic features directly correlated with sentence complexity, as established through a rigorous selection process based on the correlation with complexity judgments.

Findings

The analyses reveal several notable insights:

  • Linguistically informed fine-tuning generally enhances model performance on the target task of predicting sentence complexity, particularly in models of smaller sizes and scenarios characterized by limited data availability.
  • The impact of model size on learning linguistic features is pronounced, with larger models showing superior performance but smaller models displaying significant improvement rates, indicating their capacity to substantially benefit from the addition of explicit linguistic knowledge.
  • The effectiveness of linguistically informed models transcends language barriers, as evidenced by the performance gains in cross-lingual evaluation settings, suggesting the robustness and adaptability of the methodology across languages.
  • Preliminary investigations into the influence of individual linguistic features highlight the multifaceted nature of linguistic complexity and underscore the potential of a multi-task learning framework to exploit the interdependencies among linguistic phenomena more effectively than focusing on single features.

Implications and Future Directions

This paper underscores the value of integrating linguistic knowledge into the fine-tuning process of pre-trained Encoder-Decoder models, such as T5, to enhance their performance on tasks reliant on understanding complex linguistic properties. The findings advocate for a nuanced approach to model training, emphasizing the utility of leveraging linguistic insights to inform model adjustments, particularly in data-constrained scenarios and for languages with varying degrees of representation in pre-training corpora.

Looking forward, the methodology presents a promising avenue for further exploration, particularly concerning its efficacy with large, generative LLMs in zero- and few-shot settings. The potential of instructional fine-tuning phases to impart linguistic knowledge that significantly boosts model performance in linguistically demanding tasks warrants in-depth investigation. Additionally, future work could expand the scope of languages and tasks, explore various model architectures and sizes, and refine the selection and utilization of linguistic features to fully capture their benefits.

The integration of linguistic knowledge into NLMs, as demonstrated in this paper, not only enhances model performance on specific tasks but also contributes to the broader understanding of how these models capture and utilize linguistic phenomena, offering valuable insights for the development of more efficient, adaptable, and linguistically competent NLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Syntax-BERT: Improving pre-trained transformers with syntax trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3011–3020, Online. Association for Computational Linguistics.
  2. Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. Profiling-UD: a tool for linguistic profiling of texts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7145–7151, Marseille, France. European Language Resources Association.
  5. Ting-Yun Chang and Chi-Jen Lu. 2021. Rethinking why intermediate-task fine-tuning works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  6. Kevyn Collins-Thompson. 2015. Computational assessment of text readability: A survey of current and future research. Recent Advances in Automatic Readability Assessment and Text Simplification. Special issue of International Journal of Applied Linguistics, 165(2):97–135.
  7. Universal Dependencies. Computational Linguistics, 47(2):255–308.
  8. Thomas Givón. 1991. Markedness in grammar: distributional, communicative and cognitive correlates of syntactic structure. Studies in Language, 15(2):335–370.
  9. Goran Glavaš and Ivan Vulić. 2021. Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3090–3104, Online. Association for Computational Linguistics.
  10. Rob van der Goot. 2023. MaChAmp at SemEval-2023 tasks 2, 3, 4, 5, 7, 8, 9, 10, 11, and 12: On the effectiveness of intermediate training on an uncurated collection of datasets. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 230–245, Toronto, Canada. Association for Computational Linguistics.
  11. John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
  12. Chain of explanation: New prompting method to generate higher quality natural language explanation for implicit hate speech. arXiv preprint arXiv:2209.04889.
  13. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.
  14. Can language models learn from explanations in context? arXiv preprint arXiv:2204.02329.
  15. Probing via prompting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1144–1157, Seattle, United States. Association for Computational Linguistics.
  16. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
  17. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  18. Linguistic profiling of a neural language model. In Proceedings of the 28th International Conference on Computational Linguistics, pages 745–756, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  19. Exploring the role of task transferability in large-scale multi-task learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2542–2550, Seattle, United States. Association for Computational Linguistics.
  20. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088.
  21. Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4609–4622, Online. Association for Computational Linguistics.
  22. Intermediate-task transfer learning with pretrained language models: When and why does it work? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5231–5247, Online. Association for Computational Linguistics.
  23. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  24. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  25. Probing the probing paradigm: Does probing accuracy entail task relevance? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3363–3377, Online. Association for Computational Linguistics.
  26. That looks hard: Characterizing linguistic complexity in humans and language models. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 48–60, Online. Association for Computational Linguistics.
  27. Gabriele Sarti and Malvina Nissim. 2022. It5: Large-scale text-to-text pretraining for italian language understanding and generation. arXiv preprint arXiv:2203.03759.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Exploring and predicting transferability across NLP tasks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7882–7926, Online. Association for Computational Linguistics.
  30. Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4465–4476, Florence, Italy. Association for Computational Linguistics.
  31. Structbert: Incorporating language structures into pre-training for deep language understanding. In International Conference on Learning Representations.
  32. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  33. When to use multi-task learning vs intermediate fine-tuning for pre-trained encoder transfer learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 272–282, Dublin, Ireland. Association for Computational Linguistics.
  34. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  35. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  36. Shiwei Zhang and Xiuzhen Zhang. 2021. Does QA-based intermediate training help fine-tuning language models for text classification? In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pages 158–162, Online. Australasian Language Technology Association.
  37. LIMIT-BERT : Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4450–4461, Online. Association for Computational Linguistics.
  38. Is this sentence difficult? do you agree? In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2690–2699, Brussels, Belgium. Association for Computational Linguistics.
  39. Universal Dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria. Association for Computational Linguistics.
  40. Less is more? towards a reduced inventory of categories for training a parser for the Italian Stanford dependencies. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 83–90, Reykjavik, Iceland. European Language Resources Association (ELRA).
  41. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  42. Universal dependencies 2.5. In LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Alessio Miaschi (10 papers)
  2. Felice Dell'Orletta (12 papers)
  3. Giulia Venturi (2 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com