Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Network size and weights size for memorization with two-layers neural networks (2006.02855v2)

Published 4 Jun 2020 in cs.LG and stat.ML

Abstract: In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\mathbb{R}d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $\epsilon$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the magnitude of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sébastien Bubeck (90 papers)
  2. Ronen Eldan (60 papers)
  3. Yin Tat Lee (102 papers)
  4. Dan Mikulincer (25 papers)
Citations (32)

Summary

We haven't generated a summary for this paper yet.