Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit (2406.01581v1)

Published 3 Jun 2024 in cs.LG and stat.ML

Abstract: We study the problem of gradient descent learning of a single-index target function $f_(\boldsymbol{x}) = \textstyle\sigma_\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}d$, where the link function $\sigma_:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $\sigma_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jason D. Lee (151 papers)
  2. Kazusato Oko (12 papers)
  3. Taiji Suzuki (119 papers)
  4. Denny Wu (24 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.