Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A global convergence theory for deep ReLU implicit networks via over-parameterization (2110.05645v2)

Published 11 Oct 2021 in cs.LG, math.OC, and stat.ML

Abstract: Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the solution of an equilibrium equation. Although a line of recent empirical studies has demonstrated its superior performances, the theoretical understanding of implicit neural networks is limited. In general, the equilibrium equation may not be well-posed during the training. As a result, there is no guarantee that a vanilla (stochastic) gradient descent (SGD) training nonlinear implicit neural networks can converge. This paper fills the gap by analyzing the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks. For an $m$-width implicit neural network with ReLU activation and $n$ training samples, we show that a randomly initialized gradient descent converges to a global minimum at a linear rate for the square loss function if the implicit neural network is \textit{over-parameterized}. It is worth noting that, unlike existing works on the convergence of (S)GD on finite-layer over-parameterized neural networks, our convergence results hold for implicit neural networks, where the number of layers is \textit{infinite}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianxiang Gao (13 papers)
  2. Hailiang Liu (65 papers)
  3. Jia Liu (369 papers)
  4. Hridesh Rajan (33 papers)
  5. Hongyang Gao (23 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.