Neural Network Approximation: Three Hidden Layers Are Enough (2010.14075v4)
Abstract: A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter $N\in\mathbb{N}+$, it is shown that FLES networks with width $\max{d,N}$ and three hidden layers can uniformly approximate a H\"older continuous function $f$ on $[0,1]d$ with an exponential approximation rate $3\lambda (2\sqrt{d}){\alpha} 2{-\alpha N}$, where $\alpha \in(0,1]$ and $\lambda>0$ are the H\"older order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $2\omega_f(2\sqrt{d}){2{-N}}+\omega_f(2\sqrt{d}\,2{-N})$. Moreover, we extend such a result to general bounded continuous functions on a bounded set $E\subseteq\mathbb{R}d$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $\omega_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $\omega_f(r)\lesssim r\alpha$ for H\"older continuous functions), since the major term to be concerned in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ independent of $d$ within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the $Lp$-norm for $p\in[1,\infty)$ via replacing Floor-Exponential-Step activation functions by continuous activation functions.