Explicit Inductive Bias for Transfer Learning with Convolutional Networks (1802.01483v2)

Published 5 Feb 2018 in cs.LG

Abstract: In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

Authors (3)

Xuhong Li (40 papers)
Yves Grandvalet (12 papers)
Franck Davoine (14 papers)

Citations (323)

View on Semantic Scholar

Summary

Inductive Bias in Transfer Learning for Convolutional Networks

The paper "Explicit Inductive Bias for Transfer Learning with Convolutional Networks" explores the efficacy of incorporating explicit inductive biases into the fine-tuning process of pre-trained convolutional networks. The motivation for this research stems from the observation that while fine-tuning pre-trained models has consistently outperformed training models from scratch in transfer learning contexts, the standard methods lack mechanisms to effectively retain knowledge from the source domain. This paper addresses this gap by investigating various regularization schemes aimed at preserving the initial features learned during the pre-training phase.

Summary of Key Concepts

The paper provides a rigorous analysis of inductive transfer learning where the target task differs from the source task, but both share the same domain. It also emphasizes the shortcomings of conventional fine-tuning, where parameters of the network may diverge significantly from their initial values, potentially resulting in the loss of useful pre-trained features. The primary contribution of this paper is the development and evaluation of regularization techniques that mitigate such divergence.

Regularization Techniques Explored

L2-SP Regularization: The authors propose using an L2 penalty where the pre-trained model serves as the reference point rather than zero. This adjustment to the standard L2 weight decay results in a bias towards the initial parameters of the pre-trained model, which is argued to be more effective in transfer learning settings.
L2-SP-Fisher Regularization: Enhancing L2-SP, this method employs the Fisher information matrix to weight the importance of maintaining proximity to initial parameters during fine-tuning. While it shows promise in some contexts, it offers no significant advantage over L2-SP for the primary objective of improving target domain performance.
Group-Lasso-SP and L1-SP: These approaches and their variants, which introduce sparsity constraints, are also explored. However, they generally underperformed compared to L2-SP, potentially due to issues with optimization stability and complexity.

Experimental Validation

The research includes comprehensive experiments involving ResNet architectures and multiple image datasets to validate the proposed regularization techniques. These experiments conclusively demonstrate that the L2-SP regularization consistently outperforms the standard fine-tuning approaches in scenarios with limited target-domain data. Specifically, the results suggest noticeable improvements on target tasks when using L2-SP, where less training data necessitates efficient use of the pre-trained model's knowledge.

Implications and Future Directions

This research encourages a paradigm shift in transfer learning methodologies by introducing explicit inductive biases that marry the benefits of pre-training with the nuances of fine-tuning. The implications of these findings are multifaceted:

Practical Implications: Given its minimal computational overhead and straightforward implementation, L2-SP is recommended as a new baseline for transfer learning tasks.
Theoretical Insights: This work prompts further exploration into the theoretical underpinnings of transfer learning, such as studying the geometry of parameter spaces influenced by pre-training.
Future Directions: Further investigations could consider integrating functional-level knowledge retention techniques or extending these principles to other types of neural networks or tasks beyond image classification.

In conclusion, the paper presents compelling evidence for incorporating explicit inductive bias in the form of reference-based regularization in transfer learning contexts. It not only provides a pragmatic solution to improving performance on target tasks but also opens avenues for new research into the efficient retention and reuse of learned representations.

Related Papers

Find Related Papers