Understanding the Overfitting of the Episodic Meta-training
Abstract: Despite the success of two-stage few-shot classification methods, in the episodic meta-training stage, the model suffers severe overfitting. We hypothesize that it is caused by over-discrimination, i.e., the model learns to over-rely on the superficial features that fit for base class discrimination while suppressing the novel class generalization. To penalize over-discrimination, we introduce knowledge distillation techniques to keep novel generalization knowledge from the teacher model during training. Specifically, we select the teacher model as the one with the best validation accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL) divergence between the output distribution of the linear classifier of the teacher model and that of the student model. This simple approach outperforms the standard meta-training process. We further propose the Nearest Neighbor Symmetric Kullback-Leibler (NNSKL) divergence for meta-training to push the limits of knowledge distillation techniques. NNSKL takes few-shot tasks as input and penalizes the output of the nearest neighbor classifier, which possesses an impact on the relationships between query embedding and support centers. By combining SKL and NNSKL in meta-training, the model achieves even better performance and surpasses state-of-the-art results on several benchmarks.
- Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156 (2020).
- Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? Advances in neural information processing systems 27 (2014).
- Label refinery: Improving imagenet classification through label progression. arXiv preprint arXiv:1805.02641 (2018).
- Improved few-shot visual classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020), 14481–14490. https://doi.org/10.1109/CVPR42600.2020.01450 arXiv:1912.03432
- EASY: Ensemble augmented-shot Y-shaped learning: State-of-the-art few-shot classification with simple ingredients. arXiv preprint arXiv:2201.09699 (2022).
- Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136 (2018).
- Leo Breiman and Nong Shang. 1996. Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report 1, 2 (1996), 4.
- Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 535–541.
- Memory matching networks for one-shot image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4080–4088.
- A closer look at few-shot classification. arXiv preprint arXiv:1904.04232 (2019).
- Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9062–9071.
- Pareto self-supervised training for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13663–13672.
- A Baseline for Few-Shot Image Classification. (2019), 1–20. arXiv:1909.02729 http://arxiv.org/abs/1909.02729
- A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729 (2019).
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
- Born again neural networks. In International Conference on Machine Learning. PMLR, 1607–1616.
- Boosting few-shot visual learning with self-supervision. In Proceedings of the IEEE/CVF international conference on computer vision. 8059–8068.
- Boosting Few-Shot Visual Learning with Self-Supervision. arXiv (2019).
- Spyros Gidaris and Nikos Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4367–4375.
- Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930 (2018).
- Learning calibrated class centers for few-shot classification by pair-wise similarity. IEEE Transactions on Image Processing 31 (2022), 4543–4555.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. (2022). arXiv:2204.07305 http://arxiv.org/abs/2204.07305
- Yiren Jian and Lorenzo Torresani. 2022. Label Hallucination for Few-Shot Classification. Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 36 (2022), 7005–7014. https://doi.org/10.1609/aaai.v36i6.20659 arXiv:2112.03340
- Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437 (2019).
- Understanding Dimensional Collapse in Contrastive Self-supervised Learning. (2021), 1–17. arXiv:2110.09348 http://arxiv.org/abs/2110.09348
- Relational embedding for few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8822–8833.
- Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation. arXiv preprint arXiv:2105.08919 (2021).
- Learning multiple layers of features from tiny images. (2009).
- Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10657–10665.
- Sihan Liu and Yue Wang. 2021. Few-shot learning with online self-distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1067–1070.
- Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification. (2021). arXiv:2106.05517 http://arxiv.org/abs/2106.05517
- Alex Nichol and John Schulman. 2018. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999 2, 3 (2018), 4.
- Cheng Perng Phoo and Bharath Hariharan. 2020. Self-training for few-shot transfer across extreme task differences. arXiv preprint arXiv:2010.07734 (2020).
- Self-supervised knowledge distillation for few-shot learning. arXiv preprint arXiv:2006.09785 (2020).
- Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018).
- Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10836–10846.
- Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252.
- Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018).
- Meta-learning with latent embedding optimization. 7th International Conference on Learning Representations, ICLR 2019 (2019), 1–17. arXiv:1807.05960
- Chunhua Shen. 2020. Distance and Structured Classifiers. Cvpr (2020), 12203–12213. arXiv:arXiv:2003.06777v1
- Prototypical networks for few-shot learning. Advances in neural information processing systems 30 (2017).
- When Does Self-supervision Improve Few-shot Learning? (2019).
- When does self-supervision improve few-shot learning?. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer, 645–666.
- Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1199–1208.
- Rethinking Few-Shot Image Classification: A Good Embedding is All You Need? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12359 LNCS (2020), 266–282. https://doi.org/10.1007/978-3-030-58568-6_16 arXiv:2003.11539
- Rethinking few-shot image classification: a good embedding is all you need?. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16. Springer, 266–282.
- Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. (2019). arXiv:1903.03096 http://arxiv.org/abs/1903.03096
- Matching networks for one shot learning. Advances in neural information processing systems 29 (2016).
- The caltech-ucsd birds-200-2011 dataset. (2011).
- Few-Shot Classification with Feature Map Reconstruction Networks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2021), 8008–8017. https://doi.org/10.1109/CVPR46437.2021.00792 arXiv:2012.01506
- Few-shot classification with feature map reconstruction networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8012–8021.
- Exploring Efficient Few-shot Adaptation for Vision Transformers. 62076067 (2023). arXiv:2301.02419 http://arxiv.org/abs/2301.02419
- Few-shot classification with contrastive learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX. Springer, 293–309.
- Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8808–8817.
- Few-shot learning via embedding adaptation with set-to-set functions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020), 8805–8814. https://doi.org/10.1109/CVPR42600.2020.00883 arXiv:1812.03664
- A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4133–4141.
- Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12203–12213.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.