Robust Transfer Learning with Pretrained Language Models through Adapters

Published 5 Aug 2021 in cs.CL | (2108.02340v1)

Abstract: Transfer learning with large pretrained transformer-based LLMs like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those LLMs on downstream tasks or combining it with task-specific pretraining is often not robust. In particular, the performance considerably varies as the random seed changes or the number of pretraining and/or fine-tuning iterations varies, and the fine-tuned model is vulnerable to adversarial attack. We propose a simple yet effective adapter-based approach to mitigate these issues. Specifically, we insert small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, with (1) task-specific unsupervised pretraining and then (2) task-specific supervised training (e.g., classification, sequence labeling). Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.