Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth (2006.12231v6)

Published 22 Jun 2020 in cs.LG, cs.NA, math.NA, and stat.ML

Abstract: A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max{0,x}$) activation function in each neuron and hence we call such networks Floor-ReLU networks. For any hyper-parameters $N\in\mathbb{N}+$ and $L\in\mathbb{N}+$, it is shown that Floor-ReLU networks with width $\max{d,\, 5N+13}$ and depth $64dL+3$ can uniformly approximate a H\"older function $f$ on $[0,1]d$ with an approximation error $3\lambda d{\alpha/2}N{-\alpha\sqrt{L}}$, where $\alpha \in(0,1]$ and $\lambda$ are the H\"older order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $\omega_f(\sqrt{d}\,N{-\sqrt{L}})+2\omega_f(\sqrt{d}){N{-\sqrt{L}}}$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $\omega_f(r)$ as $r\to 0$ is moderate (e.g., $\omega_f(r) \lesssim r\alpha$ for H\"older continuous functions), since the major term to be considered in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity.

Citations (7)

Summary

We haven't generated a summary for this paper yet.