Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? (2005.00628v2)

Published 1 May 2020 in cs.CL

Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

PDF Abstract

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: An Examination

The paper "Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?" by Yada Pruksachatkun et al. provides an extensive empirical analysis concerning the efficacy of intermediate-task transfer learning. Specifically, it examines when and why intermediate-task training is beneficial within the context of fine-tuning pretrained models on natural language understanding (NLU) tasks, using RoBERTa as the baseline model. The paper thoroughly investigates 110 combinations of intermediate-target tasks and employs 25 probing tasks to determine the linguistic capabilities cultivated during this process.

Introduction

Pretrained LLMs such as BERT and RoBERTa have yielded substantial improvements across various NLU tasks. A promising method to enhance the performance of these models further involves an additional fine-tuning phase on an intermediate task before focusing on the target task, commonly known as supplementary training on intermediate labeled-data tasks (STILTs). However, the conditions fostering successful intermediate-task transfer remain nebulous. This paper explores these uncertainties by analyzing an extensive set of task pairings.

Methodology

Task Overview

The paper encompasses 11 intermediate tasks selected based on prior evidence of successful transfer or substantial annotation effort. These tasks span multiple domains, including question answering and natural language inference. Target tasks largely derive from the SuperGLUE benchmark, chosen for their complexity relative to state-of-the-art models yet simplicity for humans. Additionally, the paper employs 25 probing tasks designed to isolate various linguistic skills and phenomena, providing insights into the linguistic competencies developed through intermediate-task training.

Experimental Setup

Following a structured pipeline, the researchers fine-tune RoBERTa on each intermediate task before subsequently fine-tuning on both target and probing tasks. This comprehensive design allows for meticulous evaluation of transfer effects across varied task configurations.

Results and Analysis

Findings on Target and Probing Tasks

The research identifies that intermediate tasks necessitating high-level reasoning and inference, like MNLI and certain commonsense-oriented tasks, generally result in positive transfer. Conversely, intermediate tasks such as SocialIQA, CCG, and QQP, which prompt negative transfer, underscore the variability of this approach's success.

Probing tasks reveal that improvements in low-level syntactic capabilities exhibit minimal variation across task settings, while higher-level semantic tasks display more pronounced correlation with target-task performance. Notably, tasks related to input-noising, such as Bigram Shift and Coordination Inversion, strongly correlate with target task success, suggesting their relevance for robust intermediate-task transfer. The potential issue of catastrophic forgetting during intermediate training is also highlighted, impacting retained knowledge from the pretraining phase.

Implications and Future Research

The paper underscores the importance of deliberate intermediate-task selection, elucidating that tasks aligned with target-task objectives and maintaining the pretrained model's foundational capabilities are most beneficial. The identified correlation patterns further suggest necessity for integrating self-supervised objectives or employing mechanisms to mitigate forgetfulness during intermediate training.

Further advancements in this field might involve expanding the repertoire of probing tasks to encompass a broader spectrum of linguistic abilities, enhancing the precision with which model learning is understood. The investigation into improvements in transfer learning strategies has crucial implications for optimizing the application of pretrained transformers in complex NLU tasks, making it an enduring area of research interest.

This paper's comprehensive experimental framework and nuanced analyses contribute significantly to understanding the mechanics of transfer learning within pretrained models, offering practical insights for the strategic deployment of intermediate tasks in NLU applications.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Yada Pruksachatkun (12 papers)
Jason Phang (40 papers)
Haokun Liu (26 papers)
Phu Mon Htut (18 papers)
Xiaoyi Zhang (39 papers)
Richard Yuanzhe Pang (26 papers)
Clara Vania (16 papers)
Katharina Kann (50 papers)
Samuel R. Bowman (103 papers)

Citations (187)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos