Improving Generalization in Meta-Learning via Meta-Gradient Augmentation (2306.08460v1)
Abstract: Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.
- T. M. Hospedales, A. Antoniou, P. Micaelli, and A. J. Storkey, “Meta-learning in neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PP, no. 99, pp. 1–1, 2021.
- M. Huisman, J. N. van Rijn, and A. Plaat, “A survey of deep meta-learning,” Artif. Intell. Rev., vol. 54, no. 6, pp. 4483–4541, 2021.
- P. Tian, W. Li, and Y. Gao, “Consistent meta-regularization for better meta-knowledge in few-shot learning,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 7277–7288, 2022.
- Q. Sun, Y. Liu, Z. Chen, T. Chua, and B. Schiele, “Meta-transfer learning through hard tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1443–1456, 2022.
- H. Coskun, M. Z. Zia, B. Tekin, F. Bogo, N. Navab, F. Tombari, and H. S. Sawhney, “Domain-specific priors and meta learning for few-shot first-person action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6659–6673, 2023.
- K. Javed and M. White, “Meta-learning representations for continual learning,” in NeurIPS, 2019, pp. 1818–1828.
- G. Gupta, K. Yadav, and L. Paull, “Look-ahead meta learning for continual learning,” in NeurIPS, 2020.
- K. Qian and Z. Yu, “Domain adaptive dialog generation via meta learning,” in ACL, 2019, pp. 2639–2649.
- J. Lin, Y. Wang, Z. Chen, and T. He, “Learning to transfer: Unsupervised domain translation via meta-learning,” in AAAI, 2020, pp. 11 507–11 514.
- M. Goldblum, L. Fowl, and T. Goldstein, “Adversarially robust few-shot learning: A meta-learning approach,” in NeurIPS, 2020.
- M. Goldblum, S. Reich, L. Fowl, R. Ni, V. Cherepanova, and T. Goldstein, “Unraveling meta-learning: Understanding feature representations for few-shot tasks,” in ICML, vol. 119, 2020, pp. 3607–3616.
- S. Guiroy, V. Verma, and C. J. Pal, “Towards understanding generalization in gradient-based meta-learning,” CoRR, vol. abs/1907.07287, 2019.
- H. Yao, L. Huang, L. Zhang, Y. Wei, L. Tian, J. Zou, J. Huang, and Z. Li, “Improving generalization in meta-learning via task augmentation,” in ICML, vol. 139, 2021, pp. 11 887–11 897.
- J. Rajendran, A. Irpan, and E. Jang, “Meta-learning requires meta-augmentation,” in NeurIPS, 2020.
- M. Yin, G. Tucker, M. Zhou, S. Levine, and C. Finn, “Meta-learning without memorization,” in ICLR, 2020.
- J. Liu, F. Chao, and C. Lin, “Task augmentation by rotating for meta-learning,” CoRR, vol. abs/2003.00804, 2020.
- R. Ni, M. Goldblum, A. Sharaf, K. Kong, and T. Goldstein, “Data augmentation for meta-learning,” in ICML, vol. 139, 2021, pp. 8152–8161.
- S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in ICLR, 2017.
- M. A. Jamal and G.-J. Qi, “Task agnostic meta-learning for few-shot learning,” in CVPR, 2019, pp. 11 719–11 727.
- W. Li, G. Dasarathy, and V. Berisha, “Regularization via structural label smoothing,” in AISTATS, vol. 108, 2020, pp. 1453–1463.
- T. Yang, S. Zhu, and C. Chen, “Gradaug: A new regularization method for deep neural networks,” in NeurIPS, 2020.
- H. Tseng, Y. Chen, Y. Tsai, S. Liu, Y. Lin, and M. Yang, “Regularizing meta-learning via gradient dropout,” in ACCV, vol. 12625, 2020, pp. 218–234.
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
- C. Gong, T. Ren, M. Ye, and Q. Liu, “Maxup: A simple way to improve generalization of neural network training,” CoRR, vol. abs/2002.09024, 2020.
- L. Carratino, M. Cissé, R. Jenatton, and J. Vert, “On mixup regularization,” CoRR, vol. abs/2006.06049, 2020.
- J. Yoo, N. Ahn, and K. Sohn, “Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy,” in CVPR, 2020, pp. 8372–8381.
- Y. Gao, W. Wang, C. Herold, Z. Yang, and H. Ney, “Towards a better understanding of label smoothing in neural machine translation,” in AACL/IJCNLP, 2020, pp. 212–223.
- X. Gastaldi, “Shake-shake regularization,” CoRR, vol. abs/1705.07485, 2017.
- Y. Yamada, M. Iwamura, T. Akiba, and K. Kise, “Shakedrop regularization for deep residual learning,” IEEE Access, vol. 7, pp. 186 126–186 136, 2019.
- H. Lee, T. Nam, E. Yang, and S. J. Hwang, “Meta dropout: Learning to perturb latent features for generalization,” in ICLR, 2020.
- H. Tian, B. Liu, X. Yuan, and Q. Liu, “Meta-learning with network pruning,” in ECCV, vol. 12364, 2020, pp. 675–700.
- Q. Chen, C. Shui, and M. Marchand, “Generalization bounds for meta-learning: An information-theoretic analysis,” CoRR, vol. abs/2109.14595, 2021.
- J. Lu, P. Gong, J. Ye, and C. Zhang, “Learning from very few samples: A survey,” CoRR, vol. abs/2009.02653, 2020.
- Y. Lee and S. Choi, “Gradient-based meta-learning with learned layerwise metric and subspace,” in ICML, vol. 80, 2018, pp. 2933–2942.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, vol. 70, 2017, pp. 1126–1135.
- Z. Li, F. Zhou, F. Chen, and H. Li, “Meta-sgd: Learning to learn quickly for few shot learning,” CoRR, vol. abs/1707.09835, 2017.
- J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in NIPS, 2017, pp. 4077–4087.
- L. Bertinetto, J. F. Henriques, P. H. S. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” in ICLR, 2019.
- K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in CVPR, 2019, pp. 10 657–10 665.
- P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in ICML, vol. 70, 2017, pp. 1885–1894.
- N. Lee, T. Ajanthan, and P. H. S. Torr, “Snip: single-shot network pruning based on connection sensitivity,” in ICLR, 2019.
- H. Tanaka, D. Kunin, D. L. K. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in NeurIPS, 2020.
- J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?” in ICLR, 2021.
- C. Wang, G. Zhang, and R. B. Grosse, “Picking winning tickets before training by preserving gradient flow,” in ICLR, 2020.
- J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in ICLR, 2019.
- R. Wang, H. Sun, X. Nie, and Y. Yin, “Snip-fsl: Finding task-specific lottery jackpots for few-shot learning,” Knowl. Based Syst., vol. 247, p. 108427, 2022.
- T. Chen, J. Frankle, S. Chang, S. Liu, Y. Zhang, M. Carbin, and Z. Wang, “The lottery tickets hypothesis for supervised and self-supervised pre-training in computer vision models,” in CVPR, 2021, pp. 16 306–16 316.
- D. A. McAllester, “A pac-bayesian tutorial with A dropout bound,” CoRR, vol. abs/1307.2118, 2013.
- R. Amit and R. Meir, “Meta-learning by adjusting priors based on extended pac-bayes theory,” in ICML, vol. 80, 2018, pp. 205–214.
- K. Sakamoto and I. Sato, “Analyzing lottery ticket hypothesis from pac-bayesian theory perspective,” CoRR, vol. abs/2205.07320, 2022.
- O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in NIPS, 2016, pp. 3630–3638.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset.” Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang, “A closer look at few-shot classification,” in ICLR, 2019.
- J. K. Witt, “Introducing hat graphs,” Cogn. Res. Princ. Implic., vol. 4, no. 1, pp. 1–17, 2019.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, 2020.
- A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” CoRR, vol. abs/1803.02999, 2018.
- L. M. Zintgraf, K. Shiarlis, V. Kurin, K. Hofmann, and S. Whiteson, “Fast context adaptation via meta-learning,” in ICML, vol. 97, 2019, pp. 7693–7702.
- D. A. McAllester, “Pac-bayesian model averaging,” in COLT, 1999, pp. 164–170.
- Ren Wang (72 papers)
- Haoliang Sun (14 papers)
- Qi Wei (52 papers)
- Xiushan Nie (13 papers)
- Yuling Ma (5 papers)
- Yilong Yin (47 papers)