Diffusion-Based Neural Network Weights Generation (2402.18153v2)

Published 28 Feb 2024 in cs.LG and cs.AI

Abstract: Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained models with limited insight into their effectiveness. To address these challenges, we introduce D2NWG, a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning, conditioned on the target dataset. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation, learning the weight distributions of models pretrained on various datasets. This allows for automatic generation of weights that generalize well across both seen and unseen tasks, outperforming state-of-the-art meta-learning methods and pretrained models. Moreover, our approach is scalable to large architectures such as LLMs, overcoming the limitations of current parameter generation techniques that rely on task-specific model collections or access to original training data. By modeling the parameter distribution of LLMs, D2NWG enables task-specific parameter generation without requiring additional fine-tuning or large collections of model variants. Extensive experiments show that our method consistently enhances the performance of diverse base models, regardless of their size or complexity, positioning it as a robust solution for scalable transfer learning.

References (31)

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Heraklines1/status/1814502618397376521

https://twitter.com/Heraklines1/status/1828158610163048767

https://twitter.com/billardkarr/status/1881763860761423985

https://twitter.com/tensorqt/status/1840748262631903377

https://twitter.com/Heraklines1/status/1792985430746386805

Diffusion-Based Neural Network Weights Generation (2402.18153v2)

Summary

Related Papers

Tweets