How transferable are features in deep neural networks? (1411.1792v1)

Published 6 Nov 2014 in cs.LG and cs.NE

Abstract: Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

Authors (4)

Jason Yosinski (31 papers)
Jeff Clune (65 papers)
Yoshua Bengio (601 papers)
Hod Lipson (57 papers)

Citations (7,999)

View on Semantic Scholar

Summary

The paper demonstrates that CNN features transfer effectively from early general layers to deeper, task-specific layers.
It uses layer-by-layer experiments on ImageNet subsets to reveal the impact of co-adaptation loss on performance.
Results indicate that fine-tuning transferred features boosts performance on target tasks, especially when tasks are similar.

Analyzing the Transferability of Features in Deep Neural Networks

The research paper "How transferable are features in deep neural networks?" by Yosinski et al., investigates the extent to which features learned by deep convolutional neural networks (CNNs) can be transferred between different tasks. The central theme of this work is to experimentally quantify the generality versus specificity of neurons across different layers of a deep neural network, with a focus on transfer learning.

Research Overview

Deep neural networks, particularly CNNs, exhibit an interesting phenomenon when trained on natural images: the first layer often learns features resembling Gabor filters and color blobs. These first-layer features are considered general as they appear to apply across various datasets and tasks. In contrast, the last layer's features, which are tailored to specific classification tasks, are specific. This paper aims to investigate the transition from general to specific features through the network layers and understand the implications for transfer learning.

The authors design their investigation around several key questions:

To what extent can the generality or specificity of features at a given layer be quantified?
Does the transition from general to specific occur suddenly or gradually across layers?
What is the impact of transferring features between tasks that vary in similarity?

Methodology

The paper employs CNNs trained on the ImageNet dataset to evaluate layer-by-layer feature transferability. To assess the generality and specificity, the authors define transferability based on how well features from one task (base task) can be adapted to another task (target task). They create networks by transferring and fine-tuning layers from a network trained on the base task to the network for the target task.

The experimental setup includes:

Training pairs of CNNs on ImageNet subsets.
Evaluating transfer performance from layers at varying depths.
Comparing results for transferred versus randomly initialized weights.

Two distinct experimental splits were conducted:

Random A/B splits of ImageNet classes for similar datasets.
Man-made/natural split to create dissimilar datasets.

Experimental Findings

Similar Datasets (Random A/B Splits)

Baseline Performance: Networks trained on random A/B subsets achieved a top-1 accuracy of 62.5%, lower than the 57.5% top-1 error on the full 1000-class ImageNet, demonstrating that a reduced number of classes mitigates complexity and error rates.
Impact of Co-Adaptation: Performance degraded significantly when reinitializing and retraining upper layers on a fixed lower layer, particularly in the middle layers (3-6). This performance drop was not observed when initial layers were fine-tuned, highlighting the optimization challenges linked to breaking co-adapted layers.
Transfer Performance: Layers 1 and 2 transferred well between tasks, confirming their generality. Performance declined for layers deeper in the network, due to both loss of co-adaptation and increased feature specificity.
Boost in Generalization: Transferred and fine-tuned features from the base task provided a notable boost in generalization performance on the target task, suggesting that pre-training on a base task and subsequent fine-tuning, even with substantial retraining, might enhance network performance.

Dissimilar Datasets (Man-made/Natural Split)

Transferring features between dissimilar tasks (man-made to natural objects and vice versa) showed a more pronounced decline in performance, particularly in higher layers. This aligns with the hypothesis that the transferability of features diminishes with increasing task dissimilarity.

Comparison with Random Weights

Using random, untrained filters for initial layers led to a steep decline in performance, confirming that even distant task features are more effective than random initialization. This result contrasts with prior findings like those of Jarrett et al. (2009), which reported competent performance with random features on smaller datasets.

Implications and Future Work

This paper emphasizes the importance of understanding the transferability of neural network features for effective transfer learning. The results suggest:

Fine-tuning transferred features can significantly enhance performance on new tasks.
The structure and extent of co-adaptation across network layers play a crucial role in transfer learning effectiveness.
Feature generality is more prevalent in early layers, while higher layers exhibit increased specificity.

Future research could explore improving transfer learning algorithms by mitigating co-adaptation issues and exploring the effects of different network architectures and training strategies on feature transferability. Additionally, investigating transferability in other domains beyond image classification could broaden the application of these findings.

In summary, Yosinski et al.'s work provides valuable insights on the transferability of deep neural network features, elucidating both challenges and opportunities for advancing transfer learning methodologies in AI.

Related Papers

YouTube

Show All Videos