Meta-Learning with a Geometry-Adaptive Preconditioner (2304.01552v2)
Abstract: Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient descent with a meta-learned preconditioner can be beneficial. Existing preconditioners, however, cannot simultaneously adapt in a task-specific and path-dependent way. Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. Thanks to the two properties, the geometry-adaptive preconditioner is effective for improving the inner-loop optimization. Experiment results show that GAP outperforms the state-of-the-art MAML family and preconditioned gradient descent-MAML (PGD-MAML) family in a variety of few-shot learning tasks. Code is available at: https://github.com/Suhyun777/CVPR23-GAP.
- Meta-learning with a geometry-adaptive preconditioner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16080–16090, 2023.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Probabilistic model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
- Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
- Meta-learning with task-adaptive loss function for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9465–9474, 2021.
- Gradient-based meta-learning using uncertainty to weigh loss for few-shot learning. arXiv preprint arXiv:2208.08135, 2022.
- Rapidly adaptable legged robots via evolutionary meta-learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3769–3776. IEEE, 2020.
- A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning. Applied Soft Computing, 110:107605, 2021.
- Training medical image analysis systems like radiologists. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 546–554. Springer, 2018.
- Metamed: Few-shot medical image classification using gradient-based meta-learning. Pattern Recognition, 120:108111, 2021.
- Meta-learning for low-resource natural language generation in task-oriented dialogue systems. arXiv preprint arXiv:1905.05644, 2019.
- When does maml work the best? an empirical study on model-agnostic meta-learning in nlp applications. arXiv preprint arXiv:2005.11700, 2020.
- Meta-rcnn: Meta learning for few-shot object detection. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1679–1687, 2020.
- Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13846–13855, 2020.
- Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
- Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning, pages 2927–2936. PMLR, 2018.
- Meta-curvature. Advances in Neural Information Processing Systems, 32, 2019.
- On modulating the gradient for meta-learning. In European Conference on Computer Vision, pages 556–572. Springer, 2020.
- Meta-learning the learning trends shared across tasks. arXiv preprint arXiv:2010.09291, 2020.
- Meta-learning via hypernetworks. IEEE, 2020.
- Learning where to learn: Gradient sparsity in meta and continual learning. Advances in Neural Information Processing Systems, 34:5250–5263, 2021.
- Shun-ichi Amari. Neural learning in structured parameter spaces-natural riemannian gradient. Advances in neural information processing systems, 9, 1996.
- Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
- Shunichi Amari. A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, EC-16(3):299–307, 1967. doi: 10.1109/PGEC.1967.264666.
- Why natural gradient? In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 2, pages 1213–1216. IEEE, 1998.
- Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
- Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- When does preconditioning help or hurt generalization? arXiv preprint arXiv:2006.10732, 2020.
- Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
- John M Lee. Smooth manifolds. Springer, 2012.
- Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9068–9077, 2022.
- Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 720–721, 2020.
- An efficient statistical-based gradient compression technique for distributed training systems. Proceedings of Machine Learning and Systems, 3:297–322, 2021.
- Gilbert Strang. Linear algebra and its applications 4th ed., 2012.
- Irénée-Jules Bienaymé. Considérations à l’appui de la découverte de Laplace sur la loi de probabilité dans la méthode des moindres carrés. Imprimerie de Mallet-Bachelier, 1853.
- George Marsaglia. Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics, 43(2):645–646, 1972.
- Torchmeta: A meta-learning library for pytorch. arXiv preprint arXiv:1909.06576, 2019.
- Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096, 2019.
- Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR, 2015.
- Learning to learn task-adaptive hyperparameters for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Meta-learning with adaptive hyperparameters. Advances in Neural Information Processing Systems, 33:20755–20765, 2020a.
- Learning to forget for meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2379–2387, 2020b.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- Optimization as a model for few-shot learning. In International conference on learning representations, 2016.
- Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676, 2018.
- Bayesian model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
- Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930, 2018.
- How to train your maml. arXiv preprint arXiv:1810.09502, 2018.
- Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019.
- Meta-learning with warped gradient descent. arXiv preprint arXiv:1909.00025, 2019.
- Boil: Towards representation change for few-shot learning. arXiv preprint arXiv:2008.08882, 2020.
- Automated relational meta-learning. arXiv preprint arXiv:2001.00745, 2020.
- Sign-maml: Efficient model-agnostic meta-learning by signsgd. arXiv preprint arXiv:2109.07497, 2021.
- Clustered task-aware meta-learning by learning from learning paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- On enforcing better conditioned meta-learning for rapid few-shot adaptation. arXiv preprint arXiv:2206.07260, 2022.
- Sharp-maml: Sharpness-aware model-agnostic meta learning. arXiv preprint arXiv:2206.03996, 2022.
- Calibrating cnns for few-shot meta learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2090–2099, 2022.
- Contextual gradient scaling for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 834–843, 2022.
- Hypermaml: Few-shot adaptation of deep models with hypernetworks. arXiv preprint arXiv:2205.15745, 2022.
- Eeml: Ensemble embedded meta-learning. arXiv preprint arXiv:2206.09195, 2022a.
- A closer look at few-shot classification. arXiv preprint arXiv:1904.04232, 2019.
- The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.
- Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
- Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- The quick, draw!-ai experiment. Mount View, CA, accessed Feb, 17(2018):4, 2016.
- Fgvcx fungi classification challenge 2018. Available online: github. com/visipedia/fgvcx_fungi_comp (accessed on 14 July 2021), 2018.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In The 2013 international joint conference on neural networks (IJCNN), pages 1–8. Ieee, 2013.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Fast and flexible multi-task classification using conditional neural adaptive processes. Advances in Neural Information Processing Systems, 32, 2019.
- Improved few-shot visual classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14493–14502, 2020.
- Universal representation learning from multiple domains for few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9526–9535, 2021.
- Learning a universal template for few-shot dataset generalization. In International Conference on Machine Learning, pages 10424–10433. PMLR, 2021.
- Cross-domain few-shot learning with task-specific adapters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7161–7170, 2022b.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Learning multiple layers of features from tiny images. Citeseer, 2009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi: 10.1109/IROS.2012.6386109.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329–1338. PMLR, 2016.
- Fast context adaptation via meta-learning. In International Conference on Machine Learning, pages 7693–7702. PMLR, 2019.
- Taming the noisy gradient: train deep neural networks with small batch sizes. In The Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019.
- Suhyun Kang (7 papers)
- Duhun Hwang (6 papers)
- Moonjung Eo (8 papers)
- Taesup Kim (35 papers)
- Wonjong Rhee (34 papers)