Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH) (2007.03714v1)

Published 7 Jul 2020 in cs.LG, math.OC, math.ST, stat.ML, and stat.TH

Abstract: Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in \cite{Jacot2018Neural}. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in \cite{Huang2019Dynamics}. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width $m$ with respect to the number of training samples $n$ from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yuqing Li (19 papers)
  2. Tao Luo (149 papers)
  3. Nung Kwan Yip (12 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.