AdapterFusion: Non-Destructive Task Composition for Transfer Learning (2005.00247v3)

Published 1 May 2020 in cs.CL

Abstract: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specific parameters called adapters, that encapsulate the task-specific information. We then combine the adapters in a separate knowledge composition step. We show that by separating the two stages, i.e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner. We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at AdapterHub.ml.

Citations (748)

View on Semantic Scholar

Summary

The paper demonstrates AdapterFusion, a two-stage method that fuses task-specific adapters to prevent catastrophic forgetting.
It details a novel architecture where a dedicated fusion layer uses parameterized mixers to integrate diverse task representations.
Empirical results across 16 NLU tasks, especially on smaller datasets, show significant improvements over traditional fine-tuning methods.

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

The paper "AdapterFusion: Non-Destructive Task Composition for Transfer Learning" addresses critical challenges in transfer learning, specifically those related to sequential fine-tuning and multi-task learning. Current methodologies often struggle with issues such as catastrophic forgetting and dataset balancing. The authors propose a novel approach, AdapterFusion, a two-stage learning algorithm designed to leverage knowledge across multiple tasks while circumventing these prevalent issues.

Methodology

AdapterFusion operates in two distinct stages: knowledge extraction and knowledge composition. Initially, it involves training task-specific parameters, known as adapters, which encapsulate task-related knowledge without altering the underlying pretrained model's parameters. Subsequently, these adapters are combined in the composition stage using a novel fusion layer.

The innovative aspect of AdapterFusion lies in its architecture, where a transformer's layers interact with multiple task adapters. The fusion layer employs parameterized mixers to integrate encoded information from different adapters, enhancing the model's ability to exploit diverse task representations effectively.

Empirical Evaluation

The authors empirically evaluate AdapterFusion across 16 diverse natural language understanding (NLU) tasks. These tasks include sentiment analysis, commonsense reasoning, paraphrase detection, and textual entailment recognition. The results reveal that AdapterFusion outperforms conventional strategies, such as full fine-tuning and multi-task learning, by effectively combining knowledge from various tasks non-destructively.

The performance gains are particularly significant for smaller datasets, where AdapterFusion shows pronounced improvements over traditional methods. The separation of knowledge extraction and composition stages is shown to mitigate issues like catastrophic forgetting and task interference.

Implications and Future Directions

The introduction of AdapterFusion has practical and theoretical implications. Practically, it offers a scalable and efficient method for managing multiple tasks in transfer learning without extensive retraining. Theoretically, it highlights the potential of parameter-efficient fine-tuning strategies for improved transfer learning outcomes.

Future research may focus on extending AdapterFusion to other architectures and exploring its application in more complex, real-world scenarios. Additionally, investigating its potential for zero-shot learning and cross-linguistic tasks could further enhance its applicability.

In conclusion, AdapterFusion represents an advancement in transfer learning methodologies, providing a modular, efficient approach to leveraging multi-task knowledge. Its empirical success across diverse NLU tasks underscores its versatility and effectiveness, paving the way for further innovations in the field.

PDF Markdown