Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimax Lower Bounds for Ridge Combinations Including Neural Nets (1702.02828v1)

Published 9 Feb 2017 in stat.ML and cs.LG

Abstract: Estimation of functions of $ d $ variables is considered using ridge combinations of the form $ \textstyle\sum_{k=1}m c_{1,k} \phi(\textstyle\sum_{j=1}d c_{0,j,k}x_j-b_k) $ where the activation function $ \phi $ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size $ n $ of possibly noisy values at random sites $ X \in B = [-1,1]d $, the minimax mean square error is examined for functions in the closure of the $ \ell_1 $ hull of ridge functions with activation $ \phi $. It is shown to be of order $ d/n $ to a fractional power (when $ d $ is of smaller order than $ n $), and to be of order $ (\log d)/n $ to a fractional power (when $ d $ is of larger order than $ n $). Dependence on constraints $ v_0 $ and $ v_1 $ on the $ \ell_1 $ norms of inner parameter $ c_0 $ and outer parameter $ c_1 $, respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.

Citations (21)

Summary

We haven't generated a summary for this paper yet.