Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It) (2402.17608v1)
Abstract: In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.
- Syntax-BERT: Improving pre-trained transformers with syntax trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3011–3020, Online. Association for Computational Linguistics.
- Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Profiling-UD: a tool for linguistic profiling of texts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7145–7151, Marseille, France. European Language Resources Association.
- Ting-Yun Chang and Chi-Jen Lu. 2021. Rethinking why intermediate-task fine-tuning works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Kevyn Collins-Thompson. 2015. Computational assessment of text readability: A survey of current and future research. Recent Advances in Automatic Readability Assessment and Text Simplification. Special issue of International Journal of Applied Linguistics, 165(2):97–135.
- Universal Dependencies. Computational Linguistics, 47(2):255–308.
- Thomas Givón. 1991. Markedness in grammar: distributional, communicative and cognitive correlates of syntactic structure. Studies in Language, 15(2):335–370.
- Goran Glavaš and Ivan Vulić. 2021. Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3090–3104, Online. Association for Computational Linguistics.
- Rob van der Goot. 2023. MaChAmp at SemEval-2023 tasks 2, 3, 4, 5, 7, 8, 9, 10, 11, and 12: On the effectiveness of intermediate training on an uncurated collection of datasets. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 230–245, Toronto, Canada. Association for Computational Linguistics.
- John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
- Chain of explanation: New prompting method to generate higher quality natural language explanation for implicit hate speech. arXiv preprint arXiv:2209.04889.
- Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.
- Can language models learn from explanations in context? arXiv preprint arXiv:2204.02329.
- Probing via prompting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1144–1157, Seattle, United States. Association for Computational Linguistics.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Linguistic profiling of a neural language model. In Proceedings of the 28th International Conference on Computational Linguistics, pages 745–756, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Exploring the role of task transferability in large-scale multi-task learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2542–2550, Seattle, United States. Association for Computational Linguistics.
- Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088.
- Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4609–4622, Online. Association for Computational Linguistics.
- Intermediate-task transfer learning with pretrained language models: When and why does it work? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5231–5247, Online. Association for Computational Linguistics.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Probing the probing paradigm: Does probing accuracy entail task relevance? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3363–3377, Online. Association for Computational Linguistics.
- That looks hard: Characterizing linguistic complexity in humans and language models. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 48–60, Online. Association for Computational Linguistics.
- Gabriele Sarti and Malvina Nissim. 2022. It5: Large-scale text-to-text pretraining for italian language understanding and generation. arXiv preprint arXiv:2203.03759.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Exploring and predicting transferability across NLP tasks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7882–7926, Online. Association for Computational Linguistics.
- Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4465–4476, Florence, Italy. Association for Computational Linguistics.
- Structbert: Incorporating language structures into pre-training for deep language understanding. In International Conference on Learning Representations.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- When to use multi-task learning vs intermediate fine-tuning for pre-trained encoder transfer learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 272–282, Dublin, Ireland. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- Shiwei Zhang and Xiuzhen Zhang. 2021. Does QA-based intermediate training help fine-tuning language models for text classification? In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pages 158–162, Online. Australasian Language Technology Association.
- LIMIT-BERT : Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4450–4461, Online. Association for Computational Linguistics.
- Is this sentence difficult? do you agree? In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2690–2699, Brussels, Belgium. Association for Computational Linguistics.
- Universal Dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria. Association for Computational Linguistics.
- Less is more? towards a reduced inventory of categories for training a parser for the Italian Stanford dependencies. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 83–90, Reykjavik, Iceland. European Language Resources Association (ELRA).
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- Universal dependencies 2.5. In LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL).
- Alessio Miaschi (10 papers)
- Felice Dell'Orletta (12 papers)
- Giulia Venturi (2 papers)