Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 114 tok/s
Gemini 3.0 Pro 53 tok/s Pro
Gemini 2.5 Flash 132 tok/s Pro
Kimi K2 176 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning (2310.02751v1)

Published 4 Oct 2023 in cs.LG and cs.CV

Abstract: In this paper, we hypothesize that gradient-based meta-learning (GBML) implicitly suppresses the Hessian along the optimization trajectory in the inner loop. Based on this hypothesis, we introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) that minimizes the distance between the parameters of the target and reference models to suppress the Hessian in the inner loop. Despite dealing with high-order terms, SHOT does not increase the computational complexity of the baseline model much. It is agnostic to both the algorithm and architecture used in GBML, making it highly versatile and applicable to any GBML baseline. To validate the effectiveness of SHOT, we conduct empirical tests on standard few-shot learning tasks and qualitatively analyze its dynamics. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline. Code is available at: https://github.com/JunHoo-Lee/SHOT

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Meta-learning with adaptive hyperparameters. Advances in Neural Information Processing Systems, 33:20755–20765, 2020.
  2. A. Bernacchia. Meta-learning with negative learning rates. arXiv preprint arXiv:2102.00940, 2021.
  3. S. Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
  4. Signature verification using a" siamese" time delay neural network. Advances in neural information processing systems, 6, 1993.
  5. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  6. X. Chen and K. He. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
  7. Torchmeta: A Meta-Learning library for PyTorch, 2019. URL https://arxiv.org/abs/1909.06576. Available at: https://github.com/tristandeleu/pytorch-meta.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020. URL https://arxiv.org/abs/2010.11929.
  9. On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1082–1092. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/fallah20a.html.
  10. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  11. Meta-learning with warped gradient descent. arXiv preprint arXiv:1909.00025, 2019.
  12. Sharpness-aware minimization for efficiently improving generalization. CoRR, abs/2010.01412, 2020. URL https://arxiv.org/abs/2010.01412.
  13. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  14. Bag of tricks for image classification with convolutional neural networks, 2018. URL https://arxiv.org/abs/1812.01187.
  15. On enforcing better conditioned meta-learning for rapid few-shot adaptation. In NeurIPS, 2022.
  16. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  17. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, 2013.
  19. S. K. Kumar. On weight initialization in deep neural networks. arXiv preprint arXiv:1704.08863, 2017.
  20. Deep learning. nature, 521(7553):436–444, 2015.
  21. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  22. Boil: Towards representation change for few-shot learning. arXiv preprint arXiv:2008.08882, 2020.
  23. TADAM: task dependent adaptive metric for improved few-shot learning. CoRR, abs/1805.10123, 2018. URL http://arxiv.org/abs/1805.10123.
  24. E. Park and J. B. Oliva. Meta-curvature. In NeurIPS, 2019. Published 9 February 2019.
  25. Learning transferable visual models from natural language supervision. CoRR, abs/2103.00020, 2021. URL https://arxiv.org/abs/2103.00020.
  26. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
  27. Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019.
  28. S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJY0-Kcll.
  29. Meta-learning for semi-supervised few-shot classification. In the Sixth International Conference on Learning Representations, 2018.
  30. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  31. Meta-learning with latent embedding optimization. CoRR, abs/1807.05960, 2018a. URL http://arxiv.org/abs/1807.05960.
  32. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018b.
  33. J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
  34. On modulating the gradient for meta-learning. In ECCV, 2020.
  35. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  36. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
  37. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  38. Global convergence and induced kernels of gradient-based meta-learning with neural nets. CoRR, abs/2006.14606, 2020. URL https://arxiv.org/abs/2006.14606.
  39. Caltech-ucsd birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.