Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating Meta-Learning by Sharing Gradients (2312.08398v1)

Published 13 Dec 2023 in cs.LG

Abstract: The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes at a significant cost of slower meta-learning. To address this limitation, we explicitly encode task relatedness via an inner loop regularization mechanism inspired by multi-task learning. Our algorithm shares gradient information from previously encountered tasks as well as concurrent tasks in the same task batch, and scales their contribution with meta-learned parameters. We show using two popular few-shot classification datasets that gradient sharing enables meta-learning under bigger inner loop learning rates and can accelerate the meta-training process by up to 134%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Learning to learn by self-critique. In Advances in Neural Information Processing Systems, pp. 9936–9946, 2019.
  2. How to train your maml. arXiv preprint arXiv:1810.09502, 2018.
  3. Jonathan Baxter. A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine learning, 28(1):7–39, 1997.
  4. Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
  5. Torchmeta: A Meta-Learning library for PyTorch, 2019. URL https://arxiv.org/abs/1909.06576. Available at: https://github.com/tristandeleu/pytorch-meta.
  6. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.  1126–1135. JMLR. org, 2017.
  7. Towards understanding generalization in gradient-based meta-learning. arXiv preprint arXiv:1907.07287, 2019.
  8. Task agnostic meta-learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  11719–11727, 2019.
  9. Provable guarantees for gradient-based meta-learning. arXiv preprint arXiv:1902.10644, 2019.
  10. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  11. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
  12. Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, pp. 113–124, 2019.
  13. Optimization as a model for few-shot learning, 2016. URL https://openreview.net/forum?id=rJY0-Kcll.
  14. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  15. Meta-learning without memorization. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BklEFpEYwS.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Oscar Chang (20 papers)
  2. Hod Lipson (57 papers)

Summary

We haven't generated a summary for this paper yet.