Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Exact Computation with an Infinitely Wide Neural Net (1904.11955v2)

Published 26 Apr 2019 in cs.LG, cs.CV, cs.NE, and stat.ML

Abstract: How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers --- is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for the performance of a pure kernel-based method on CIFAR-10, being $10\%$ higher than the methods reported in [Novak et al., 2019], and only $6\%$ lower than the performance of the corresponding finite deep net architecture (once batch normalization, etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sanjeev Arora (93 papers)
  2. Simon S. Du (120 papers)
  3. Wei Hu (309 papers)
  4. Zhiyuan Li (304 papers)
  5. Ruslan Salakhutdinov (248 papers)
  6. Ruosong Wang (37 papers)
Citations (849)

Summary

  • The paper develops the first efficient algorithm to compute the Convolutional Neural Tangent Kernel (CNTK) exactly in the infinite-width regime.
  • The paper extends the NTK framework to convolutional neural networks, achieving a 77% classification accuracy on CIFAR-10 and outperforming random feature approximations.
  • The paper provides a non-asymptotic proof demonstrating that sufficiently wide neural nets trained by gradient descent are equivalent to kernel regression predictors.

On Exact Computation with an Infinitely Wide Neural Net

The paper "On Exact Computation with an Infinitely Wide Neural Net" presents significant theoretical and practical advancements in understanding neural networks by examining their behavior in the infinite width limit. The research focus is on classic deep neural architectures such as AlexNet and VGG19 and their performance on standard datasets like CIFAR-10 when the width of the networks—that is, the number of channels in convolutional layers and nodes in fully-connected layers—is increased to infinity.

Theoretical Insights

Neural Tangent Kernel (NTK) Framework

The paper builds on the recently introduced Neural Tangent Kernel (NTK), which represents fully-connected neural networks in the infinite width limit trained by gradient descent. The NTK framework elegantly connects deep learning optimization and generalization with the broader mathematical theory of Gaussian processes and kernel methods. Specifically, the paper extends the NTK to convolutional neural networks (CNNs), developing a Convolutional Neural Tangent Kernel (CNTK) that captures similar infinite-width behavior for CNNs.

Efficient Computation

The paper's primary contribution is the development of the first efficient algorithm to compute the CNTK exactly. This algorithm enables the practical evaluation of the CNTK on large datasets like CIFAR-10. The paper reports that the CNTK achieves a classification accuracy of 77% on CIFAR-10, which is approximately 10% higher than previous fixed kernel methods and only 6% lower than corresponding finite neural networks without batch normalization and similar enhancements.

Key Theoretical Results

The paper provides a non-asymptotic proof that demonstrates the equivalence between a fully-trained sufficiently wide net and kernel regression using the NTK. This theoretical underpinning firmly establishes that, in the infinite width limit, the neural network's prediction aligns with the kernel regression predictor, thereby validating the NTK's utility in understanding neural networks.

Practical Implications

Empirical Performance

The empirical results underscore the potential of CNTKs as a powerful kernel method. The significant accuracy achieved on CIFAR-10 highlights the CNTK's capability to serve as a robust benchmark model. Furthermore, the paper compares CNTKs with random feature approximations and shows the superiority of exact kernel computation over these approximations, which perform poorly even on simpler tasks like CIFAR-2 classification.

Neural Architecture Search

An interesting observation from the experiments is the correlation between the performance of CNTKs trained on full datasets and on smaller subsets (e.g., 2000 training examples). This consistency suggests that small-scale CNTKs could inform neural architecture search, potentially guiding the selection of promising architectures before scaling up to full datasets.

Future Directions

The results open several avenues for future research. One direction involves extending the analysis to more complex neural network features such as Batch Normalization or Residual Layers. Additionally, while the paper establishes the equivalence between infinite-width nets and kernel methods, it also indicates a performance gap between finite and infinite networks, suggesting further exploration into the benefits of finite over-parameterization is needed. Finally, developing efficient algorithms for other non-trivial settings and applying them to various datasets can further validate and expand the theoretical findings.

Conclusion

In summary, this paper advances our theoretical and practical understanding of neural networks in the infinite width limit. By rigorously proving the equivalence between fully-trained wide nets and NTK-based kernel regression, and demonstrating the empirical effectiveness of CNTKs, the paper provides a robust framework for future research in deep learning and kernel methods. The insights gained from this paper are pivotal for both theoretical exploration and practical applications in machine learning.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com