Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models (2305.02279v3)

Published 3 May 2023 in cs.LG, cs.AI, and cs.CV

Abstract: During the continuous evolution of one organism's ancestry, its genes accumulate extensive experiences and knowledge, enabling newborn descendants to rapidly adapt to their specific environments. Motivated by this observation, we propose a novel machine learning paradigm Learngene to enable learning models to incorporate three key characteristics of genes. (i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model. (ii) Condensing: the extensive accumulated knowledge is condensed into a much more compact information piece, i.e., learngene. (iii) Inheriting: the condensed learngene is inherited to make it easier for descendant models to adapt to new environments. Since accumulating has been studied in well-established paradigms like large-scale pre-training and lifelong learning, we focus on condensing and inheriting, which induces three key issues and we provide the preliminary solutions to these issues in this paper: (i) Learngene Form: the learngene is set to a few integral layers that can preserve significance. (ii) Learngene Condensing: we identify which layers among the ancestry model have the most similarity as one pseudo descendant model. (iii) Learngene Inheriting: to construct distinct descendant models for the specific downstream tasks, we stack some randomly initialized layers to the learngene layers. Extensive experiments across various settings, including using different network architectures like Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) on different datasets, are carried out to confirm four advantages of Learngene: it makes the descendant models 1) converge more quickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and 4) require fewer training samples to converge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. A. M. Zador, “A critique of pure learning and what artificial neural networks can learn from animal brains,” Nature communications, vol. 10, no. 1, pp. 1–7, 2019.
  2. U. Hasson, S. A. Nastase, and A. Goldstein, “Direct fit to nature: an evolutionary perspective on biological and artificial neural networks,” Neuron, vol. 105, no. 3, pp. 416–434, 2020.
  3. F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76, 2020.
  4. T. M. Hospedales, A. Antoniou, P. Micaelli, and A. J. Storkey, “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 01, pp. 1–1, 2021.
  5. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  6. A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8821–8831.
  7. C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with sparse mixture of experts,” Advances in Neural Information Processing Systems, vol. 34, pp. 8583–8595, 2021.
  8. Z.-H. Zhou, “Learnware: on the future of machine learning.” Frontiers Comput. Sci., vol. 10, no. 4, pp. 589–590, 2016.
  9. A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay, “Adversarial attacks and defences: A survey,” arXiv preprint arXiv:1810.00069, 2018.
  10. B. Balle, G. Cherubin, and J. Hayes, “Reconstructing training data with informed adversaries,” in 2022 IEEE Symposium on Security and Privacy (SP).   IEEE, 2022, pp. 1138–1156.
  11. G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural networks : the official journal of the International Neural Network Society, vol. 113, pp. 54–71, 2018.
  12. A. Brock, S. De, S. L. Smith, and K. Simonyan, “High-performance large-scale image recognition without normalization,” in International Conference on Machine Learning.   PMLR, 2021, pp. 1059–1071.
  13. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
  14. P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y. Wei, “Layercam: Exploring hierarchical class activation maps for localization,” IEEE Transactions on Image Processing, vol. 30, pp. 5875–5888, 2021.
  15. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning.   PMLR, 2017, pp. 1126–1135.
  16. J. Vanschoren, “Meta-learning: A survey,” arXiv preprint arXiv:1810.03548, 2018.
  17. Q. Wang, X. Geng, S. Lin, S. Xia, L. Qi, and N. Xu, “Learngene: From open-world to your learning task,” Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022.
  18. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  19. L. Y. Pratt, “Discriminability-based transfer between neural networks,” Advances in neural information processing systems, vol. 5, 1992.
  20. M. Iman, H. R. Arabnia, and K. Rasheed, “A review of deep transfer learning and recent advancements,” Technologies, vol. 11, no. 2, p. 40, 2023.
  21. K. He, R. Girshick, and P. Dollár, “Rethinking imagenet pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4918–4927.
  22. B. Zoph, G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. Le, “Rethinking pre-training and self-training,” Advances in neural information processing systems, vol. 33, pp. 3833–3845, 2020.
  23. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
  24. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” The 3rd International Conference on Learning Representations, 2015.
  25. L. Zhang, C. Bao, and K. Ma, “Self-distillation: Towards efficient and compact neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4388–4403, 2021.
  26. S. Srinivas and F. Fleuret, “Knowledge transfer with jacobian matching,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4723–4731.
  27. N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2790–2799.
  28. R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson, “Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks,” in Annual Meeting of the Association for Computational Linguistics, 2021.
  29. R. Karimi Mahabadi, J. Henderson, and S. Ruder, “Compacter: Efficient low-rank hypercomplex adapter layers,” Advances in Neural Information Processing Systems, vol. 34, pp. 1022–1035, 2021.
  30. B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.   Association for Computational Linguistics, Nov. 2021, pp. 3045–3059.
  31. X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. abs/2101.00190, 2021.
  32. S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Learning multiple visual domains with residual adapters,” Advances in neural information processing systems, vol. 30, 2017.
  33. Y.-L. Sung, J. Cho, and M. Bansal, “Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5227–5237.
  34. J. Schmidhuber, “Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook,” Ph.D. dissertation, Technische Universität München, 1987.
  35. S. Bengio, Y. Bengio, J. Cloutier, and J. Gescei, “On the optimization of a synaptic learning rule,” in Optimality in Biological and Artificial Networks?   Routledge, 2013, pp. 281–303.
  36. J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in neural information processing systems, vol. 30, 2017.
  37. F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1199–1208.
  38. Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020.
  39. B. Liu, Y. Cao, Y. Lin, Q. Li, Z. Zhang, M. Long, and H. Hu, “Negative margin matters: Understanding margin in few-shot classification,” in European Conference on Computer Vision.   Springer, 2020, pp. 438–455.
  40. C. Zhang, Y. Cai, G. Lin, and C. Shen, “Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 203–12 213.
  41. H.-J. Ye, H. Hu, D.-C. Zhan, and F. Sha, “Few-shot learning via embedding adaptation with set-to-set functions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8808–8817.
  42. H. Wang, H. Zhao, and B. Li, “Bridging multi-task learning and meta-learning: Towards efficient training and effective adaptation,” in International Conference on Machine Learning.   PMLR, 2021, pp. 10 991–11 002.
  43. S. Laenen and L. Bertinetto, “On episodes, prototypical networks, and few-shot learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  44. N. Fei, Z. Lu, T. Xiang, and S. Huang, “Melr: Meta-learning via modeling episode-level relationships for few-shot learning,” in International Conference on Learning Representations, 2021.
  45. D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  46. Q. Dou, D. Coelho de Castro, K. Kamnitsas, and B. Glocker, “Domain generalization via model-agnostic learning of semantic features,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  47. M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, and C. Finn, “Adaptive risk minimization: Learning to adapt to domain shift,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  48. U. Ojha, Y. Li, J. Lu, A. A. Efros, Y. J. Lee, E. Shechtman, and R. Zhang, “Few-shot image generation via cross-domain correspondence,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 743–10 752.
  49. M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in International conference on machine learning.   PMLR, 2018, pp. 4334–4343.
  50. J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Meta-weight-net: Learning an explicit mapping for sample weighting,” Advances in neural information processing systems, vol. 32, 2019.
  51. Z. Wang, G. Hu, and Q. Hu, “Training noise-robust deep neural networks via meta-learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4524–4533.
  52. Y. Xu, L. Zhu, L. Jiang, and Y. Yang, “Faster meta update strategy for noise-robust deep learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 144–153.
  53. Z. Zhang and T. Pfister, “Learning fast sample re-weighting without reward data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 725–734.
  54. B. Colson, P. Marcotte, and G. Savard, “An overview of bilevel optimization,” Annals of operations research, vol. 153, no. 1, pp. 235–256, 2007.
  55. A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: from classical to evolutionary approaches and applications,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 276–295, 2017.
  56. L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil, “Bilevel programming for hyperparameter optimization and meta-learning,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1568–1577.
  57. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
  58. T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born again neural networks,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1607–1616.
  59. T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 1997–2017, 2019.
  60. J. N. Kundu, N. Venkat, A. Revanur, R. V. Babu et al., “Towards inheritable models for open-set domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 376–12 385.
  61. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, Conference Track Proceedings, 2015.
  62. M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  63. M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
  64. M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy, “Do vision transformers see like convolutional neural networks?” in Advances in Neural Information Processing Systems, 2021.
  65. N. Park and S. Kim, “How do vision transformers work?” in International Conference on Learning Representations, 2022.
  66. L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal, “Explaining explanations: An overview of interpretability of machine learning,” in 5th IEEE International Conference on Data Science and Advanced Analytics, 2018.   IEEE, 2018, pp. 80–89.
  67. S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” International Conference on Learning Representations, 2017.
  68. Y. Jang, H. Lee, S. J. Hwang, and J. Shin, “Learning what and where to transfer,” in International Conference on Machine Learning.   PMLR, 2019, pp. 3030–3039.
  69. W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3967–3976.
  70. K. Murugesan, V. Sadashivaiah, R. Luss, K. Shanmugam, P.-Y. Chen, and A. Dhurandhar, “Auto-transfer: Learning to route transferrable representations,” International Conference on Machine Learning, 2022.
  71. Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2507–2516.
  72. A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” Univ. Toronto, Technical Report, 2009.
  73. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
  74. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021.   IEEE, 2021, pp. 9630–9640.
  75. A. Krizhevsky and G. Hinton, “Convolutional deep belief networks on cifar-10,” Unpublished manuscript, vol. 40, no. 7, pp. 1–9, 2010.
  76. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  77. C. Zhu, R. Ni, Z. Xu, K. Kong, W. R. Huang, and T. Goldstein, “Gradinit: Learning to initialize neural networks for stable and efficient training,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 410–16 422, 2021.
  78. X. S. Huang, F. Perez, J. Ba, and M. Volkovs, “Improving transformer optimization through better initialization,” in International Conference on Machine Learning.   PMLR, 2020, pp. 4475–4483.
  79. Y. Liu, E. Sangineto, W. Bi, N. Sebe, B. Lepri, and M. D. Nadai, “Efficient training of visual transformers with small datasets,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021.
  80. H. Gani, M. Naseer, and M. Yaqub, “How to train vision transformer on small-scale datasets?” ArXiv, vol. abs/2210.07240, 2022.
  81. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching networks for one shot learning,” Advances in neural information processing systems, vol. 29, 2016.
  82. A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
  83. W.-Y. Chen, Y.-C. Liu, Z. Kira, Y. Wang, and J.-B. Huang, “A closer look at few-shot classification,” International Conference on Learning Representations, 2019.
  84. E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, U. Evci, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, and H. Larochelle, “Meta-dataset: A dataset of datasets for learning to learn from few examples,” in International Conference on Learning Representations, 2020.
  85. J. Oh, H. Yoo, C. Kim, and S.-Y. Yun, “Boil: Towards representation change for few-shot learning,” in International Conference on Learning Representations, 2021.
  86. B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “A comprehensive overhaul of feature distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1921–1930.
  87. Z. Wang, Z. Dai, B. Póczos, and J. Carbonell, “Characterizing and avoiding negative transfer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 293–11 302.
  88. T. Choudhary, V. Mishra, A. Goswami, and J. Sarangapani, “A comprehensive survey on model compression and acceleration,” Artificial Intelligence Review, vol. 53, no. 7, pp. 5113–5155, 2020.
Citations (7)

Summary

We haven't generated a summary for this paper yet.