Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Pre-training Truly Better Than Meta-Learning? (2306.13841v1)

Published 24 Jun 2023 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we emphasize a fair comparison by using: the same architecture, the same optimizer, and all models trained to convergence. Crucially, we use a more rigorous statistical tool -- the effect size (Cohen's d) -- to determine the practical significance of the difference between a model trained with PT vs. a MAML. We then use a previously proposed metric -- the diversity coefficient -- to compute the average formal diversity of a dataset. Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average. The caveat is that the magnitude of the average difference between a PT vs. MAML using the effect size is low (according to classical statistical thresholds) -- less than 0.2. Nevertheless, this observation is contrary to the currently held belief that a pre-trained model is always better than a meta-learning model. Our extensive experiments consider 21 few-shot learning benchmarks, including the large-scale few-shot learning dataset Meta-Data set. We also show no significant difference between a MAML model vs. a PT model with GPT-2 on Openwebtext. We, therefore, conclude that a pre-trained model does not always beat a meta-learned model and that the formal diversity of a dataset is a driving factor.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. TASK2VEC: Task Embedding for Meta-Learning Charless Fowlkes UCI and AWS. Technical report, 2019.
  2. Andrade, C. Mean difference, standardized mean difference (smd), and their use in meta-analysis: As simple as it gets. The Journal of Clinical Psychiatry, 81(5):11349, 2020. doi: 10.4088/JCP.20f13681. URL https://doi.org/10.4088/JCP.20f13681.
  3. Learning to learn by gradient descent by gradient descent, 2016.
  4. "learn2learn: A library for Meta-Learning research". aug 2020. URL "http://arxiv.org/abs/2008.12284".
  5. Meta-learning with differentiable closed-form solvers, 2019.
  6. Biewald, L. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
  7. Memory efficient meta-learning with large images. arXiv preprint arXiv:2107.01105, 2021. doi: 10.48550/arXiv.2107.01105. URL https://doi.org/10.48550/arXiv.2107.01105.
  8. Language Models are Few-Shot Learners. Technical report, 2020.
  9. Evaluating Large Language Models Trained on Code. URL https://www.github.com/openai/human-eval.
  10. Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis. sep 2021. doi: 10.48550/arxiv.2109.14595. URL https://arxiv.org/abs/2109.14595v2.
  11. A Closer Look at Few-shot Classification. 7th International Conference on Learning Representations, ICLR 2019, 2019. URL http://arxiv.org/abs/1904.04232.
  12. A New Meta-Baseline for Few-Shot Learning. Technical report, 2020. URL https://github.com/.
  13. Chollet, F. On the Measure of Intelligence. 2019. URL http://arxiv.org/abs/1911.01547.
  14. Describing textures in the wild, 2013.
  15. Cohen, J. A power primer. Psychological Bulletin, Jul 1992. doi: 10.1037/0033-2909.112.1.155.
  16. Torchmeta: A Meta-Learning library for PyTorch, 2019. URL https://arxiv.org/abs/1909.06576. Available at: https://github.com/tristandeleu/pytorch-meta.
  17. The Advantage of Conditional Meta-Learning for Biased Regularization and Fine-Tuning. Advances in Neural Information Processing Systems, 2020-December, aug 2020. ISSN 10495258. doi: 10.48550/arxiv.2008.10857. URL https://arxiv.org/abs/2008.10857v1.
  18. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1:4171–4186, oct 2018. URL https://arxiv.org/abs/1810.04805v2.
  19. A Baseline for Few-Shot Image Classification. 2019. URL http://arxiv.org/abs/1909.02729.
  20. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. 2017. URL http://arxiv.org/abs/1703.03400.
  21. Modeling and Optimization Trade-off in Meta-learning. Advances in Neural Information Processing Systems, 2020-December, oct 2020. ISSN 10495258. doi: 10.48550/arxiv.2010.12916. URL https://arxiv.org/abs/2010.12916v2.
  22. Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks. 37th International Conference on Machine Learning, ICML 2020, PartF168147-5:3565–3574, feb 2020. doi: 10.48550/arxiv.2002.06753. URL https://arxiv.org/abs/2002.06753v3.
  23. Delaunay: A dataset of abstract art for psychophysical and machine learning research. arXiv, 2022. doi: 10.48550/arXiv.2201.12123. URL https://doi.org/10.48550/arXiv.2201.12123.
  24. Generalized inner loop meta-learning. arXiv preprint arXiv:1910.01727, 2019.
  25. A Broader Study of Cross-Domain Few-Shot Learning. dec 2019. URL http://arxiv.org/abs/1912.07200.
  26. LEVELS OF ANALYSIS FOR MACHINE LEARNING. 2020.
  27. Proof artifact co-training for theorem proving with language models.
  28. Hataya, R. anatome, a pytorch library to analyze internal representation of neural networks, 2020. URL https://github.com/moskomule/anatome.
  29. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December:770–778, dec 2015. ISSN 10636919. doi: 10.1109/CVPR.2016.90. URL https://arxiv.org/abs/1512.03385v1.
  30. All you need is a good representation: A multi-level and classifier-centric representation for few-shot learning. 2019. URL http://arxiv.org/abs/1911.12476.
  31. Big transfer (bit): General visual representation learning. arXiv, 2020. doi: 10.48550/arXiv.1912.11370. URL https://doi.org/10.48550/arXiv.1912.11370.
  32. ImageNet Classification with Deep Convolutional Neural Networks. 2012. URL http://code.google.com/p/cuda-convnet/.
  33. The Effect of Diversity in Meta-Learning. jan 2022. doi: 10.48550/arxiv.2201.11775. URL https://arxiv.org/abs/2201.11775v1.
  34. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015. doi: 10.1126/science.aab3050. URL https://www.science.org/doi/abs/10.1126/science.aab3050.
  35. Universal representation learning from multiple domains for few-shot classification, 2021.
  36. Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability. 2021.
  37. Too big to fail: Large samples and the p-value problem. Information Systems Research, 24:906–917, 12 2013. doi: 10.1287/isre.2013.0480.
  38. Fine-grained visual classification of aircraft, 2013.
  39. Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Phenomenology and the Cognitive Sciences, 8(4):397, 1982. ISSN 15687759. URL http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=12242&ref=nf.
  40. Miranda, B. An empirical study of the properties of meta-learning - presentation. Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), dec 2020a. URL https://www.ideals.illinois.edu/handle/2142/109112.
  41. Miranda, B. An empirical study of the properties of meta-learning - presentation. Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), 2020b. URL https://www.ideals.illinois.edu/handle/2142/109112.
  42. The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence. arXiv, 2022. doi: 10.48550/arXiv.2208.01545. URL https://arxiv.org/abs/2208.01545.
  43. Transformer models for type inference in the simply typed lambda calculus: A case study in deep learning for code. arXiv, 2023. doi: 10.48550/arXiv.2304.10500. URL https://doi.org/10.48550/arXiv.2304.10500.
  44. Playing Atari with Deep Reinforcement Learning. 2013.
  45. A visual vocabulary for flower classification. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pp.  1447–1454, 2006. doi: 10.1109/CVPR.2006.42.
  46. Generative Language Modeling for Automated Theorem Proving. sep 2020. URL http://arxiv.org/abs/2009.03393.
  47. Mathematical Reasoning via Self-supervised Skip-tree Training. Technical report.
  48. Language models are unsupervised multitask learners. 2019.
  49. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. Technical report, 2020. URL https://arxiv.org/abs/1909.09157.
  50. Embedding propagation: Smoother manifold for few-shot classification, 2020.
  51. An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization. feb 2021. doi: 10.48550/arxiv.2102.13128. URL https://arxiv.org/abs/2102.13128v2.
  52. Are emergent abilities of large language models a mirage?, 2023.
  53. Mastering the game of Go with deep neural networks and tree search. Nature 2016 529:7587, 529(7587):484–489, jan 2016. ISSN 1476-4687. doi: 10.1038/nature16961. URL https://www.nature.com/articles/nature16961.
  54. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?, 2020.
  55. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. 2019. URL http://arxiv.org/abs/1903.03096.
  56. Matching networks for one shot learning, 2017.
  57. The Role of Global Labels in Few-Shot Classification and How to Infer Them. aug 2021. doi: 10.48550/arxiv.2108.04055. URL https://arxiv.org/abs/2108.04055v2.
  58. Mastering Atari Games with Limited Data. oct 2021. URL https://arxiv.org/abs/2111.00210v1.
Citations (3)

Summary

We haven't generated a summary for this paper yet.