2000 character limit reached
Principled Weight Initialization for Hypernetworks (2312.08399v1)
Published 13 Dec 2023 in cs.LG
Abstract: Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.
- Hypernetwork knowledge graph embeddings. arXiv preprint arXiv:1808.07018, 2018.
- Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pp. 153–160, 2007.
- Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344, 2017.
- A generative model for sampling high-performance and diverse weights for neural networks. arXiv preprint arXiv:1905.02898, 2019.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, 2010.
- Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
- Approximating the predictive distribution via adversarially-trained hypernetworks. In Bayesian Deep Learning Workshop, NeurIPS (Spotlight), volume 2018, 2018.
- A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Hypernetwork functional image representation. arXiv preprint arXiv:1902.10404, 2019.
- Evolving neural networks in compressed weight space. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp. 619–626. ACM, 2010.
- Predictive uncertainty quantification with compound density networks. arXiv preprint arXiv:1902.01080, 2019.
- Bayesian hypernetworks. arXiv preprint arXiv:1710.04759, 2017.
- Computing higher order derivatives of matrix and tensor expressions. In Advances in Neural Information Processing Systems, pp. 2755–2764, 2018.
- Metapruning: Meta learning for automatic neural network channel pruning. arXiv preprint arXiv:1903.10258, 2019.
- Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419, 2018.
- Modular universal reparameterization: Deep multi-task learning across diverse domains. arXiv preprint arXiv:1906.00097, 2019.
- Hyperst-net: Hypernetworks for spatio-temporal forecasting. arXiv preprint arXiv:1809.10889, 2018.
- Implicit weight uncertainty in neural networks. arXiv preprint arXiv:1711.01297, 2017.
- Neale Ratzlaff and Li Fuxin. Hypergan: A generative model for diverse, performant neural networks. arXiv preprint arXiv:1901.11058, 2019.
- On the convergence of adam and beyond. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ryQu7f-RZ.
- Learning representations by back-propagating errors. NATURE, 323:9, 1986.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013.
- Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion. arXiv preprint arXiv:1906.00794, 2019.
- Meta networks for neural style transfer. arXiv preprint arXiv:1709.04111, 2017.
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- A hypercube-based encoding for evolving large-scale neural networks. Artificial life, 15(2):185–212, 2009.
- Joseph Suarez. Language modeling with recurrent highway hypernetworks. In Advances in neural information processing systems, pp. 3267–3276, 2017.
- Hypernetworks with statistical filtering for defending adversarial examples. arXiv preprint arXiv:1711.01791, 2017.
- Hypernetwork-based implicit posterior estimation and model averaging of cnn. In Asian Conference on Machine Learning, pp. 176–191, 2018.
- Continual learning with hypernetworks. arXiv preprint arXiv:1906.00695, 2019.
- The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems, pp. 4148–4158, 2017.
- Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19, 2018.
- Graph hypernetworks for neural architecture search. arXiv preprint arXiv:1810.05749, 2018.
- Fixup initialization: Residual learning without normalization. arXiv preprint arXiv:1901.09321, 2019.