Task Adaptive Parameter Sharing for Multi-Task Learning (2203.16708v1)

Published 30 Mar 2022 in cs.LG and cs.CV

Abstract: Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. This enables multi-task learning while minimizing resources used and competition between tasks. TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights. Further, a sparsity penalty on the number of active layers encourages weight sharing with the base model. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. Moreover, TAPS is agnostic to the model architecture and requires only minor changes to the training scheme. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces Task Adaptive Parameter Sharing (TAPS), a novel method adapting pre-trained models for multi-task learning by selectively tuning minimal layers to reduce cost and interference.
Empirical results show TAPS achieves state-of-the-art performance with fewer task-specific parameters and automatically discovers architecture-specific sharing patterns.
TAPS offers a scalable solution balancing computational cost and accuracy, enabling efficient incremental and joint multi-task learning.

Task Adaptive Parameter Sharing for Multi-Task Learning

The paper "Task Adaptive Parameter Sharing for Multi-Task Learning" introduces a novel approach named Task Adaptive Parameter Sharing (TAPS) aimed at the effective adaptation of pre-trained models for multiple downstream tasks while mitigating the memory and computational costs usually associated with fine-tuning. The key contribution of TAPS lies in its ability to select and tune only a minimal subset of task-specific layers in the base model, thereby facilitating multi-task learning while reducing competitive interference among tasks and conserving computational resources.

Overview of the Methodology

TAPS achieves adaptive layer selection via a continuous relaxation method. The paper details a joint optimization framework that determines the layers to be shared with the base model and those to be specialized. This optimization reduces to learning specific task weights while employing a sparsity penalty on the number of active layers, thus promoting weight sharing. Essentially, unbeknownst to the combinatorial complexity of selecting from $2^L$ layer configurations—where $L$ is the number of layers—TAPS simplifies the problem through stochastic gradient descent. Its efficacy is confirmed across various model architectures, including ResNet, DenseNet, and ViT.

Experimental Results

The empirical evaluation of TAPS showcases its state-of-the-art performance when applied to a suite of fine-tuning tasks. Its effectiveness is highlighted by its ability to maintain high accuracy with fewer task-specific parameters compared to existing methods. A significant finding is TAPS’s agility in discovering architecture-specific sharing patterns automatically, such as adapting only self-attention layers for Vision Transformers, demonstrating its versatility beyond traditional CNN architectures.

Implications

Practically, TAPS offers a scalable solution for domains that require flexibility in computational cost versus accuracy. This adaptability promotes efficient incremental and joint multi-task learning scenarios. Theoretically, TAPS’s approach could influence future research on layer selection strategies and cross-domain learning efficiencies, potentially extending to sparse network architectures or zero-shot learning scenarios.

Future Directions

TAPS's methodology could inspire a novel line of inquiry around the limitations of weight sharing amongst tasks, particularly in incrementally added tasks, suggesting avenues for further research into parameter sharing paradigms across complex multi-task networks. Its impact could extend into network pruning strategies or dynamic adaptation models in other artificial intelligence applications.

In conclusion, TAPS stands as a noteworthy contribution to the field of multi-task learning, presenting a novel perspective on resource-efficient model adaptation that balances performance with computational overhead. Its implications could transcend current practices, offering insights into adaptive architectures and scalable learning models across diverse computational tasks.

Related Papers

YouTube

Show All Videos