Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mutual Information of Neural Network Initialisations: Mean Field Approximations (2102.04374v1)

Published 8 Feb 2021 in cs.IT and math.IT

Abstract: The ability to train randomly initialised deep neural networks is known to depend strongly on the variance of the weight matrices and biases as well as the choice of nonlinear activation. Here we complement the existing geometric analysis of this phenomenon with an information theoretic alternative. Lower bounds are derived for the mutual information between an input and hidden layer outputs. Using a mean field analysis we are able to provide analytic lower bounds as functions of network weight and bias variances as well as the choice of nonlinear activation. These results show that initialisations known to be optimal from a training point of view are also superior from a mutual information perspective.

Summary

We haven't generated a summary for this paper yet.