Principled Weight Initialization for Hypernetworks (2312.08399v1)

Published 13 Dec 2023 in cs.LG

Abstract: Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.

References (37)

Citations (70)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Generating Interpretable Networks using Hypernetworks (2023)
HyperNetworks (2016)
Magnitude Invariant Parametrizations Improve Hypernetwork Learning (2023)
A Brief Review of Hypernetworks in Deep Learning (2023)
Partial Hypernetworks for Continual Learning (2023)