Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion-Based Neural Network Weights Generation (2402.18153v2)

Published 28 Feb 2024 in cs.LG and cs.AI

Abstract: Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained models with limited insight into their effectiveness. To address these challenges, we introduce D2NWG, a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning, conditioned on the target dataset. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation, learning the weight distributions of models pretrained on various datasets. This allows for automatic generation of weights that generalize well across both seen and unseen tasks, outperforming state-of-the-art meta-learning methods and pretrained models. Moreover, our approach is scalable to large architectures such as LLMs, overcoming the limitations of current parameter generation techniques that rely on task-specific model collections or access to original training data. By modeling the parameter distribution of LLMs, D2NWG enables task-specific parameter generation without requiring additional fine-tuning or large collections of model variants. Extensive experiments show that our method consistently enhances the performance of diverse base models, regardless of their size or complexity, positioning it as a robust solution for scalable transfer learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Retrieval-augmented diffusion models. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  15309–15324. Curran Associates, Inc., 2022.
  2. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020.
  3. A brief review of hypernetworks in deep learning. ArXiv, abs/2306.06955, 2023. URL https://api.semanticscholar.org/CorpusID:259138728.
  4. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:10850–10869, 2022.
  5. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  6. Diffusion models beat GANs on image synthesis. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021.
  7. Survey on automated machine learning (automl) and meta learning. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp.  1–5, 2021. doi: 10.1109/ICCCNT51525.2021.9579526.
  8. Hypernetworks, 2016.
  9. Hypernetworks. In International Conference on Learning Representations, 2017.
  10. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  11. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  6840–6851. Curran Associates, Inc., 2020.
  12. Automated Machine Learning - Methods, Systems, Challenges. Springer, 2019.
  13. Task-adaptive neural network search with meta-contrastive learning. In Neural Information Processing Systems, 2021.
  14. Parameter prediction for unseen deep architectures. In Advances in Neural Information Processing Systems, 2021.
  15. Can we scale transformers to predict parameters of diverse imagenet models? In International Conference on Machine Learning, 2023.
  16. Rapid neural architecture search by learning to generate graphs from datasets. In International Conference on Learning Representations, 2021.
  17. Set transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the 36th International Conference on Machine Learning, pp.  3744–3753, 2019.
  18. Hyperparameter optimization through neural network partitioning. In The Eleventh International Conference on Learning Representations, 2023.
  19. Meta-learning via classifier(-free) diffusion guidance. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
  20. Improved denoising diffusion probabilistic models. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  8162–8171. PMLR, 18–24 Jul 2021.
  21. Learning to learn with generative models of neural network checkpoints, 2022.
  22. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021. URL https://api.semanticscholar.org/CorpusID:231591445.
  23. Hypergan: A generative model for diverse, performant neural networks, 2020.
  24. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10674–10685, 2021.
  25. Hyper-representations as generative models: Sampling unseen neural network weights. In Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS), September 2022a.
  26. Model zoos: A dataset of diverse populations of neural network models. In Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, September 2022b.
  27. Self-supervised representation learning on neural network weights for model characteristic prediction. In Advances in Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia, 2021.
  28. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  23965–23998. PMLR, 17–23 Jul 2022.
  29. Graph hypernetworks for neural architecture search. In International Conference on Learning Representations, 2019.
  30. HyperTransformer: Model generation for supervised and semi-supervised few-shot learning. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  27075–27098. PMLR, 17–23 Jul 2022.
  31. Conditional text image generation with diffusion models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  14235–14244, 2023. doi: 10.1109/CVPR52729.2023.01368.
Citations (6)

Summary

We haven't generated a summary for this paper yet.