Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixed width treelike neural networks capacity analysis -- generic activations (2402.05696v1)

Published 8 Feb 2024 in stat.ML, cond-mat.dis-nn, cs.IT, cs.LG, math.IT, and math.PR

Abstract: We consider the capacity of \emph{treelike committee machines} (TCM) neural networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23} recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the networks with the most typical, \emph{sign}, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard \emph{linear} activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the \emph{quadratic} and \emph{rectified linear unit (ReLU)} ones. In more concrete terms, for each of these activations, we obtain both the RDT and pl RDT based memory capacities upper bound characterization for \emph{any} given (even) number of the hidden layer neurons, $d$. In the process, we also uncover the following two, rather remarkable, facts: 1) contrary to the common wisdom, both sets of results show that the bounding capacity decreases for large $d$ (the width of the hidden layer) while converging to a constant value; and 2) the maximum bounding capacity is achieved for the networks with precisely \textbf{\emph{two}} hidden layer neurons! Moreover, the large $d$ converging values are observed to be in excellent agrement with the statistical physics replica theory based predictions.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com