Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Initialization of Neural Networks (2004.09506v3)

Published 20 Apr 2020 in cs.LG, math.OC, and stat.ML

Abstract: The proper initialization of weights is crucial for the effective training and fast convergence of deep neural networks (DNNs). Prior work in this area has mostly focused on balancing the variance among weights per layer to maintain stability of (i) the input data propagated forwards through the network and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only firstorder effects. In this paper, we propose and discuss an initialization principle that is based on a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. The proposed approach is more systematic and recovers previous results for DNN activations such as smooth functions, dropouts, and ReLU. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool which helps to more rigorously initialize weights

Citations (2)

Summary

We haven't generated a summary for this paper yet.