Transfusion: Understanding Transfer Learning for Medical Imaging (1902.07208v3)

Published 14 Feb 2019 in cs.CV, cs.LG, and stat.ML

Abstract: Transfer learning from natural image datasets, particularly ImageNet, using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental differences in data sizes, features and task specifications between natural image classification and the target medical tasks, and there is little understanding of the effects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer offers little benefit to performance, and simple, lightweight models can perform comparably to ImageNet architectures. Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse. We isolate where useful feature reuse occurs, and outline the implications for more efficient model exploration. We also explore feature independent benefits of transfer arising from weight scalings.

Citations (909)

View on Semantic Scholar

Summary

The paper demonstrates that transfer learning provides minimal performance gains in medical imaging tasks compared to lightweight CNNs.
The paper employs SVCCA to reveal that representational differences between transferred and randomly initialized models are significant in larger networks but negligible in smaller ones.
The paper highlights that reusing lower-layer weights and weight scaling are critical for accelerating model convergence and achieving efficient training.

Transfusion: Understanding Transfer Learning for Medical Imaging

In the paper "Transfusion: Understanding Transfer Learning for Medical Imaging," the authors Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, and Samy Bengio present an in-depth analysis of transfer learning's impact on medical imaging tasks, critically evaluating the assumed benefits of this commonly employed methodology. This paper attempts to bridge the gap in understanding the transfer learning process from standard image datasets like ImageNet to specific medical imaging applications, which share significant differences in data characteristics and task requisites.

Key Findings

Performance Evaluation:
- The paper undertakes a rigorous performance evaluation of multiple architectures, both standard and lightweight, on two large-scale medical imaging tasks: diabetic retinopathy diagnosis using retinal fundus photographs and diagnosis of thoracic pathologies using chest X-rays.
- Notably, transfer learning does not offer substantial performance improvements: smaller and simpler convolutional neural networks (CNNs) such as CBR (Convolution-Batchnorm-ReLU) architectures perform comparably to standard ImageNet models like ResNet50 and Inception-v3.
- Statistics reveal no marked advantage of transfer learning in the very small data regime, undermining the expected benefits even with minimal training data.
Representation Analysis:
- The paper leverages Singular Vector Canonical Correlation Analysis (SVCCA) to explore the representational similarity between models trained with and without transfer learning.
- The findings suggest significant representational differences primarily in larger models, while in smaller models, these differences are minimal.
- Observations indicate that pretrained models change less during fine-tuning, especially in the lower layers, which implies overparametrization in larger models.
Feature Reuse and Weight Transfusion:
- The analysis identifies weight initialization in the lowest layers as crucial for effective feature reuse, with meaningful benefits primarily located in these layers.
- Experimenting with weight transfusion, where only a subset of pretrained weights is used, further reinforces that the largest convergence gains derive from reusing lower-layer weights.
- Proposed hybrid approaches such as redefining the upper network layers while retaining initial lower-layer weights or using synthetic Gabor filters for initialization show promising results by achieving similar performance with increased flexibility.
Feature-Independent Benefits:
- A distinct contribution of this research is the identification of feature-independent benefits due to weight scaling inherent in pretrained models.
- By employing a Mean Var initialization—sampling weights to match the pretrained weights' mean and variance—the paper demonstrates a considerable acceleration in convergence speed.

Implications and Future Directions

Practical Implications

The practical implications of this research are significant for the deployment and optimization of deep learning models in medical imaging. The revelations about the overparametrization of standard ImageNet models for medical tasks suggest a paradigm shift towards more efficient, smaller models without compromising diagnostic accuracy. This could lead to more computationally affordable solutions, facilitating on-device applications and mobile usage where resources are limited.

Moreover, the empirical findings on the limited transfer learning benefits in the very small data regime challenge the general reliance on large pretrained models, pushing for more contextually designed architectures tailored to specific medical image analysis.

Theoretical Implications

Theoretically, the paper advances our understanding of why transfer learning is not universally beneficial. It highlights the critical aspects of model architecture and the nuanced role of initialization layers. The insights about representational dynamics and the localized benefits of pretrained weights, primarily in lower layers, contribute to the foundational knowledge necessary for future research on transfer learning mechanisms.

Future developments in AI could explore alternative methods for initializing weights, particularly in tasks with substantially different data characteristics than ImageNet. Continued research could refine hybrid transfer learning methods, optimizing both the architecture and initialization process to harness the full potential of pretrained information selectively and efficiently.

This paper calls for a reassessment of standard practices in transfer learning for medical imaging, promoting an evidence-based approach to adopting and adapting pretrained models. Future work could build upon these findings to develop more sophisticated models explicitly designed for specialized tasks in medical imaging, ultimately enhancing diagnostic capabilities and clinical outcomes.

In conclusion, while transfer learning remains a valuable tool in many domains, its advantages within medical imaging need nuanced understanding and judicious application, as presented in "Transfusion: Understanding Transfer Learning for Medical Imaging."

PDF Markdown