Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques (2310.06148v1)
Abstract: Deep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluated on tasks from a different data distribution than the one used for training, a baseline that simply finetunes a pre-trained network may be more effective than more complicated meta-learning techniques such as MAML, which is one of the most popular meta-learning techniques. This is surprising as the learning behaviour of MAML mimics that of finetuning: both rely on re-using learned features. We investigate the observed performance differences between finetuning, MAML, and another meta-learning technique called Reptile, and show that MAML and Reptile specialize for fast adaptation in low-data regimes of similar data distribution as the one used for training. Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML. Lastly, we show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile. Due to this lack of diversity and distribution specialization, MAML and Reptile may fail to generalize to out-of-distribution tasks whereas finetuning can fall back on the diversity of the learned features.
- Metalearning: Applications to Automated Machine Learning and Data Mining. Springer, Cham, 2nd edition, 2022.
- A closer look at few-shot classification. In International Conference on Learning Representations, ICLR’19, 2019.
- Why does maml outperform erm? an optimization perspective. arXiv preprint arXiv:2010.14672, 2020.
- Torchmeta: A Meta-Learning library for PyTorch, 2019. URL https://arxiv.org/abs/1909.06576. Available at: https://github.com/tristandeleu/pytorch-meta.
- ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009.
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, page 1126–1135. PMLR, 2017.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2021.
- A survey of deep meta-learning. Artificial Intelligence Review, 54(6):4483–4541, 2021.
- Stateless neural meta-learning using second-order gradients. Machine Learning, 111(9):3227–3244, 2022.
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, NIPS’12, pages 1097–1105, 2012.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Meta-learning with differentiable convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019.
- Charting the right manifold: Manifold mixup for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2218–2227, 2020.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. In International Conference on Learning Representations, ICLR’20, 2020.
- S. Ravi and H. Larochelle. Optimization as a Model for Few-Shot Learning. In International Conference on Learning Representations, ICLR’17, 2017.
- Meta-learning with latent embedding optimization. In International Conference on Learning Representations, ICLR’19, 2019.
- T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.
- J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. Master’s thesis, Technische Universität München, 1987.
- Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- Prototypical Networks for Few-shot Learning. In Advances in Neural Information Processing Systems 30, NIPS’17, pages 4077–4087. Curran Associates Inc., 2017.
- S. Thrun. Lifelong Learning Algorithms. In Learning to learn, pages 181–209. Springer, Boston, MA, 1998.
- Rethinking few-shot image classification: a good embedding is all you need? arXiv preprint arXiv:2003.11539, 2020.
- Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems 29, NIPS’16, pages 3637–3645, 2016.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144, 2016.
- Free lunch for few-shot learning: Distribution calibration. In International Conference on Learning Representations, ICLR’21, 2021.