Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Generalization in Meta-Learning via Meta-Gradient Augmentation (2306.08460v1)

Published 14 Jun 2023 in cs.LG and cs.AI

Abstract: Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. T. M. Hospedales, A. Antoniou, P. Micaelli, and A. J. Storkey, “Meta-learning in neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PP, no. 99, pp. 1–1, 2021.
  2. M. Huisman, J. N. van Rijn, and A. Plaat, “A survey of deep meta-learning,” Artif. Intell. Rev., vol. 54, no. 6, pp. 4483–4541, 2021.
  3. P. Tian, W. Li, and Y. Gao, “Consistent meta-regularization for better meta-knowledge in few-shot learning,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 7277–7288, 2022.
  4. Q. Sun, Y. Liu, Z. Chen, T. Chua, and B. Schiele, “Meta-transfer learning through hard tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1443–1456, 2022.
  5. H. Coskun, M. Z. Zia, B. Tekin, F. Bogo, N. Navab, F. Tombari, and H. S. Sawhney, “Domain-specific priors and meta learning for few-shot first-person action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6659–6673, 2023.
  6. K. Javed and M. White, “Meta-learning representations for continual learning,” in NeurIPS, 2019, pp. 1818–1828.
  7. G. Gupta, K. Yadav, and L. Paull, “Look-ahead meta learning for continual learning,” in NeurIPS, 2020.
  8. K. Qian and Z. Yu, “Domain adaptive dialog generation via meta learning,” in ACL, 2019, pp. 2639–2649.
  9. J. Lin, Y. Wang, Z. Chen, and T. He, “Learning to transfer: Unsupervised domain translation via meta-learning,” in AAAI, 2020, pp. 11 507–11 514.
  10. M. Goldblum, L. Fowl, and T. Goldstein, “Adversarially robust few-shot learning: A meta-learning approach,” in NeurIPS, 2020.
  11. M. Goldblum, S. Reich, L. Fowl, R. Ni, V. Cherepanova, and T. Goldstein, “Unraveling meta-learning: Understanding feature representations for few-shot tasks,” in ICML, vol. 119, 2020, pp. 3607–3616.
  12. S. Guiroy, V. Verma, and C. J. Pal, “Towards understanding generalization in gradient-based meta-learning,” CoRR, vol. abs/1907.07287, 2019.
  13. H. Yao, L. Huang, L. Zhang, Y. Wei, L. Tian, J. Zou, J. Huang, and Z. Li, “Improving generalization in meta-learning via task augmentation,” in ICML, vol. 139, 2021, pp. 11 887–11 897.
  14. J. Rajendran, A. Irpan, and E. Jang, “Meta-learning requires meta-augmentation,” in NeurIPS, 2020.
  15. M. Yin, G. Tucker, M. Zhou, S. Levine, and C. Finn, “Meta-learning without memorization,” in ICLR, 2020.
  16. J. Liu, F. Chao, and C. Lin, “Task augmentation by rotating for meta-learning,” CoRR, vol. abs/2003.00804, 2020.
  17. R. Ni, M. Goldblum, A. Sharaf, K. Kong, and T. Goldstein, “Data augmentation for meta-learning,” in ICML, vol. 139, 2021, pp. 8152–8161.
  18. S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in ICLR, 2017.
  19. M. A. Jamal and G.-J. Qi, “Task agnostic meta-learning for few-shot learning,” in CVPR, 2019, pp. 11 719–11 727.
  20. W. Li, G. Dasarathy, and V. Berisha, “Regularization via structural label smoothing,” in AISTATS, vol. 108, 2020, pp. 1453–1463.
  21. T. Yang, S. Zhu, and C. Chen, “Gradaug: A new regularization method for deep neural networks,” in NeurIPS, 2020.
  22. H. Tseng, Y. Chen, Y. Tsai, S. Liu, Y. Lin, and M. Yang, “Regularizing meta-learning via gradient dropout,” in ACCV, vol. 12625, 2020, pp. 218–234.
  23. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
  24. C. Gong, T. Ren, M. Ye, and Q. Liu, “Maxup: A simple way to improve generalization of neural network training,” CoRR, vol. abs/2002.09024, 2020.
  25. L. Carratino, M. Cissé, R. Jenatton, and J. Vert, “On mixup regularization,” CoRR, vol. abs/2006.06049, 2020.
  26. J. Yoo, N. Ahn, and K. Sohn, “Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy,” in CVPR, 2020, pp. 8372–8381.
  27. Y. Gao, W. Wang, C. Herold, Z. Yang, and H. Ney, “Towards a better understanding of label smoothing in neural machine translation,” in AACL/IJCNLP, 2020, pp. 212–223.
  28. X. Gastaldi, “Shake-shake regularization,” CoRR, vol. abs/1705.07485, 2017.
  29. Y. Yamada, M. Iwamura, T. Akiba, and K. Kise, “Shakedrop regularization for deep residual learning,” IEEE Access, vol. 7, pp. 186 126–186 136, 2019.
  30. H. Lee, T. Nam, E. Yang, and S. J. Hwang, “Meta dropout: Learning to perturb latent features for generalization,” in ICLR, 2020.
  31. H. Tian, B. Liu, X. Yuan, and Q. Liu, “Meta-learning with network pruning,” in ECCV, vol. 12364, 2020, pp. 675–700.
  32. Q. Chen, C. Shui, and M. Marchand, “Generalization bounds for meta-learning: An information-theoretic analysis,” CoRR, vol. abs/2109.14595, 2021.
  33. J. Lu, P. Gong, J. Ye, and C. Zhang, “Learning from very few samples: A survey,” CoRR, vol. abs/2009.02653, 2020.
  34. Y. Lee and S. Choi, “Gradient-based meta-learning with learned layerwise metric and subspace,” in ICML, vol. 80, 2018, pp. 2933–2942.
  35. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, vol. 70, 2017, pp. 1126–1135.
  36. Z. Li, F. Zhou, F. Chen, and H. Li, “Meta-sgd: Learning to learn quickly for few shot learning,” CoRR, vol. abs/1707.09835, 2017.
  37. J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in NIPS, 2017, pp. 4077–4087.
  38. L. Bertinetto, J. F. Henriques, P. H. S. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” in ICLR, 2019.
  39. K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in CVPR, 2019, pp. 10 657–10 665.
  40. P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in ICML, vol. 70, 2017, pp. 1885–1894.
  41. N. Lee, T. Ajanthan, and P. H. S. Torr, “Snip: single-shot network pruning based on connection sensitivity,” in ICLR, 2019.
  42. H. Tanaka, D. Kunin, D. L. K. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in NeurIPS, 2020.
  43. J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?” in ICLR, 2021.
  44. C. Wang, G. Zhang, and R. B. Grosse, “Picking winning tickets before training by preserving gradient flow,” in ICLR, 2020.
  45. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in ICLR, 2019.
  46. R. Wang, H. Sun, X. Nie, and Y. Yin, “Snip-fsl: Finding task-specific lottery jackpots for few-shot learning,” Knowl. Based Syst., vol. 247, p. 108427, 2022.
  47. T. Chen, J. Frankle, S. Chang, S. Liu, Y. Zhang, M. Carbin, and Z. Wang, “The lottery tickets hypothesis for supervised and self-supervised pre-training in computer vision models,” in CVPR, 2021, pp. 16 306–16 316.
  48. D. A. McAllester, “A pac-bayesian tutorial with A dropout bound,” CoRR, vol. abs/1307.2118, 2013.
  49. R. Amit and R. Meir, “Meta-learning by adjusting priors based on extended pac-bayes theory,” in ICML, vol. 80, 2018, pp. 205–214.
  50. K. Sakamoto and I. Sato, “Analyzing lottery ticket hypothesis from pac-bayesian theory perspective,” CoRR, vol. abs/2205.07320, 2022.
  51. O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in NIPS, 2016, pp. 3630–3638.
  52. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset.”   Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  53. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
  54. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  55. W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang, “A closer look at few-shot classification,” in ICLR, 2019.
  56. J. K. Witt, “Introducing hat graphs,” Cogn. Res. Princ. Implic., vol. 4, no. 1, pp. 1–17, 2019.
  57. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, 2020.
  58. A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” CoRR, vol. abs/1803.02999, 2018.
  59. L. M. Zintgraf, K. Shiarlis, V. Kurin, K. Hofmann, and S. Whiteson, “Fast context adaptation via meta-learning,” in ICML, vol. 97, 2019, pp. 7693–7702.
  60. D. A. McAllester, “Pac-bayesian model averaging,” in COLT, 1999, pp. 164–170.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ren Wang (72 papers)
  2. Haoliang Sun (14 papers)
  3. Qi Wei (52 papers)
  4. Xiushan Nie (13 papers)
  5. Yuling Ma (5 papers)
  6. Yilong Yin (47 papers)

Summary

We haven't generated a summary for this paper yet.