Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta-Learning with a Geometry-Adaptive Preconditioner (2304.01552v2)

Published 4 Apr 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient descent with a meta-learned preconditioner can be beneficial. Existing preconditioners, however, cannot simultaneously adapt in a task-specific and path-dependent way. Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. Thanks to the two properties, the geometry-adaptive preconditioner is effective for improving the inner-loop optimization. Experiment results show that GAP outperforms the state-of-the-art MAML family and preconditioned gradient descent-MAML (PGD-MAML) family in a variety of few-shot learning tasks. Code is available at: https://github.com/Suhyun777/CVPR23-GAP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Meta-learning with a geometry-adaptive preconditioner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16080–16090, 2023.
  2. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  3. Probabilistic model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
  4. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
  5. Meta-learning with task-adaptive loss function for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9465–9474, 2021.
  6. Gradient-based meta-learning using uncertainty to weigh loss for few-shot learning. arXiv preprint arXiv:2208.08135, 2022.
  7. Rapidly adaptable legged robots via evolutionary meta-learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3769–3776. IEEE, 2020.
  8. A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning. Applied Soft Computing, 110:107605, 2021.
  9. Training medical image analysis systems like radiologists. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 546–554. Springer, 2018.
  10. Metamed: Few-shot medical image classification using gradient-based meta-learning. Pattern Recognition, 120:108111, 2021.
  11. Meta-learning for low-resource natural language generation in task-oriented dialogue systems. arXiv preprint arXiv:1905.05644, 2019.
  12. When does maml work the best? an empirical study on model-agnostic meta-learning in nlp applications. arXiv preprint arXiv:2005.11700, 2020.
  13. Meta-rcnn: Meta learning for few-shot object detection. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1679–1687, 2020.
  14. Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13846–13855, 2020.
  15. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
  16. Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning, pages 2927–2936. PMLR, 2018.
  17. Meta-curvature. Advances in Neural Information Processing Systems, 32, 2019.
  18. On modulating the gradient for meta-learning. In European Conference on Computer Vision, pages 556–572. Springer, 2020.
  19. Meta-learning the learning trends shared across tasks. arXiv preprint arXiv:2010.09291, 2020.
  20. Meta-learning via hypernetworks. IEEE, 2020.
  21. Learning where to learn: Gradient sparsity in meta and continual learning. Advances in Neural Information Processing Systems, 34:5250–5263, 2021.
  22. Shun-ichi Amari. Neural learning in structured parameter spaces-natural riemannian gradient. Advances in neural information processing systems, 9, 1996.
  23. Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
  24. Shunichi Amari. A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, EC-16(3):299–307, 1967. doi: 10.1109/PGEC.1967.264666.
  25. Why natural gradient? In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 2, pages 1213–1216. IEEE, 1998.
  26. Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
  27. Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.
  28. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  29. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  30. When does preconditioning help or hurt generalization? arXiv preprint arXiv:2006.10732, 2020.
  31. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
  32. John M Lee. Smooth manifolds. Springer, 2012.
  33. Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9068–9077, 2022.
  34. Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 720–721, 2020.
  35. An efficient statistical-based gradient compression technique for distributed training systems. Proceedings of Machine Learning and Systems, 3:297–322, 2021.
  36. Gilbert Strang. Linear algebra and its applications 4th ed., 2012.
  37. Irénée-Jules Bienaymé. Considérations à l’appui de la découverte de Laplace sur la loi de probabilité dans la méthode des moindres carrés. Imprimerie de Mallet-Bachelier, 1853.
  38. George Marsaglia. Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics, 43(2):645–646, 1972.
  39. Torchmeta: A meta-learning library for pytorch. arXiv preprint arXiv:1909.06576, 2019.
  40. Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096, 2019.
  41. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  42. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR, 2015.
  43. Learning to learn task-adaptive hyperparameters for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  44. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  45. Meta-learning with adaptive hyperparameters. Advances in Neural Information Processing Systems, 33:20755–20765, 2020a.
  46. Learning to forget for meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2379–2387, 2020b.
  47. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  48. Optimization as a model for few-shot learning. In International conference on learning representations, 2016.
  49. Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676, 2018.
  50. Bayesian model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
  51. Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930, 2018.
  52. How to train your maml. arXiv preprint arXiv:1810.09502, 2018.
  53. Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019.
  54. Meta-learning with warped gradient descent. arXiv preprint arXiv:1909.00025, 2019.
  55. Boil: Towards representation change for few-shot learning. arXiv preprint arXiv:2008.08882, 2020.
  56. Automated relational meta-learning. arXiv preprint arXiv:2001.00745, 2020.
  57. Sign-maml: Efficient model-agnostic meta-learning by signsgd. arXiv preprint arXiv:2109.07497, 2021.
  58. Clustered task-aware meta-learning by learning from learning paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  59. On enforcing better conditioned meta-learning for rapid few-shot adaptation. arXiv preprint arXiv:2206.07260, 2022.
  60. Sharp-maml: Sharpness-aware model-agnostic meta learning. arXiv preprint arXiv:2206.03996, 2022.
  61. Calibrating cnns for few-shot meta learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2090–2099, 2022.
  62. Contextual gradient scaling for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 834–843, 2022.
  63. Hypermaml: Few-shot adaptation of deep models with hypernetworks. arXiv preprint arXiv:2205.15745, 2022.
  64. Eeml: Ensemble embedded meta-learning. arXiv preprint arXiv:2206.09195, 2022a.
  65. A closer look at few-shot classification. arXiv preprint arXiv:1904.04232, 2019.
  66. The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.
  67. Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018.
  68. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  69. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  70. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  71. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
  72. The quick, draw!-ai experiment. Mount View, CA, accessed Feb, 17(2018):4, 2016.
  73. Fgvcx fungi classification challenge 2018. Available online: github. com/visipedia/fgvcx_fungi_comp (accessed on 14 July 2021), 2018.
  74. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  75. Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In The 2013 international joint conference on neural networks (IJCNN), pages 1–8. Ieee, 2013.
  76. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  77. Fast and flexible multi-task classification using conditional neural adaptive processes. Advances in Neural Information Processing Systems, 32, 2019.
  78. Improved few-shot visual classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14493–14502, 2020.
  79. Universal representation learning from multiple domains for few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9526–9535, 2021.
  80. Learning a universal template for few-shot dataset generalization. In International Conference on Machine Learning, pages 10424–10433. PMLR, 2021.
  81. Cross-domain few-shot learning with task-specific adapters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7161–7170, 2022b.
  82. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  83. Learning multiple layers of features from tiny images. Citeseer, 2009.
  84. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  85. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi: 10.1109/IROS.2012.6386109.
  86. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  87. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329–1338. PMLR, 2016.
  88. Fast context adaptation via meta-learning. In International Conference on Machine Learning, pages 7693–7702. PMLR, 2019.
  89. Taming the noisy gradient: train deep neural networks with small batch sizes. In The Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Suhyun Kang (7 papers)
  2. Duhun Hwang (6 papers)
  3. Moonjung Eo (8 papers)
  4. Taesup Kim (35 papers)
  5. Wonjong Rhee (34 papers)
Citations (11)