LEEP: A New Measure to Evaluate Transferability of Learned Representations

Published 27 Feb 2020 in cs.LG, cs.CV, and stat.ML | (2002.12462v2)

Abstract: We introduce a new measure to evaluate the transferability of representations learned by classifiers. Our measure, the Log Expected Empirical Prediction (LEEP), is simple and easy to compute: when given a classifier trained on a source data set, it only requires running the target data set through this classifier once. We analyze the properties of LEEP theoretically and demonstrate its effectiveness empirically. Our analysis shows that LEEP can predict the performance and convergence speed of both transfer and meta-transfer learning methods, even for small or imbalanced data. Moreover, LEEP outperforms recently proposed transferability measures such as negative conditional entropy and H scores. Notably, when transferring from ImageNet to CIFAR100, LEEP can achieve up to 30% improvement compared to the best competing method in terms of the correlations with actual transfer accuracy.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (197)

View on Semantic Scholar

Summary

Log Expected Empirical Prediction: Evaluating Transferability in Learned Representations

The paper "LEEP: A New Measure to Evaluate Transferability of Learned Representations" presents a novel method for assessing the transferability of learned representations between classification tasks in deep learning. The method, coined as Log Expected Empirical Prediction (LEEP), provides a simple and effective measure compared to existing approaches. By evaluating the transferability using this method, researchers can predict the performance of both transfer learning and meta-transfer learning models prior to their application on target datasets.

Key Insights

Definition and Computation of LEEP: LEEP quantifies transferability by computing an expected empirical prediction score. The computation involves applying a pre-trained source model to the target dataset, obtaining dummy label distributions for target data, and using these distributions to measure the likelihood of predicted labels aligning with the target task's labels.
Theoretical Properties: The LEEP measure is theoretically shown to be bounded between negative conditional entropy (NCE) and the maximal average log likelihood achievable by re-training the classifier head on freezing the feature extractor. This provides a rigorous mathematical framework for understanding its behavior and predictive power.
Empirical Performance: Empirical analysis demonstrates LEEP's capability to correlate effectively with the transfer accuracy of models, notably outperforming recently proposed measures such as NCE and H scores under various task conditions. For instance, LEEP showed up to a 30% improvement in correlation coefficients when predicting accuracy from ImageNet to CIFAR100 transfers.
Application in Diverse Scenarios: The measure is effective across different settings, including large-scale, small-scale, and imbalanced datasets. It also predicts convergence speeds of transfer learning methods, offering valuable insights into task difficulties that help optimize model training processes.
Use in Model and Task Selection: LEEP provides a basis for choosing the best source model from a pool of trained candidates, and can significantly optimize the selection process in real-world applications where computational resources and time are critical.

Implications for Research and Development

The implications of LEEP extend to several areas:

Transfer Learning: By accurately assessing the transferability between tasks without the need for re-training models, LEEP aids in efficient resource management and better decision-making regarding task selection for joint training or transfer learning applications.
Meta-Transfer Learning: As the first measure applicable to assess transferability in meta-transfer learning scenarios, LEEP enables advancements in frameworks where task adaptation is learned through meta-training processes across various domains.
Task Space Representation: LEEP could further influence research in task space modeling, offering a metric to explore and quantify the relationships and similarities between diverse tasks in a non-symmetric, computationally tractable manner.
Feature Selection and Model Architecture Design: Given its predictive nature, LEEP can assist in tasks like feature selection and model architecture design by illuminating the compatibility of learned representations across distinct tasks.

Future Directions

The paper spotlights an essential direction for future research on extending LEEP beyond its current limitations. Possible enhancements could incorporate considerations for heterogeneous source and target task inputs, as well as adaptations that take into account network architectures. The potential integration of LEEP into automatic machine learning pipelines for hyperparameter selection and continual learning frameworks is particularly promising, offering avenues to maximize the transfer of learned knowledge efficiently.

Overall, the introduction of LEEP as a transferability measure instantiates robust, scalable evaluations that have the potential to profoundly impact both theoretical analysis and practical implementations in machine learning and AI research domains.