Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximation and interpolation of deep neural networks (2304.10552v2)

Published 20 Apr 2023 in cs.LG, math.OC, math.PR, and stat.ML

Abstract: In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in $L1(\RR)$ and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.

Summary

  • The paper proves that deep neural networks in the overparameterized regime achieve universal approximation and interpolation with non-affine activations.
  • It characterizes the parameter solution space as an (n-d)-dimensional submanifold and links the findings to the double descent phenomenon.
  • A novel probabilistic method reduces the required hidden neurons from O(d log^2 d) to O(d log d), improving interpolation efficiency in practice.

Approximation and Interpolation of Deep Neural Networks

The paper by Vlad Raul Constantinescu and Ionel Popescu presents significant theoretical advancements in the understanding of interpolation and approximation capabilities of deep neural networks in the overparameterized regime. The authors focus on establishing conditions under which neural networks can universally approximate functions and interpolate datasets, emphasizing the role of the activation function.

The paper rigorously proves that deep neural networks, when overparameterized, are capable of universal approximation and interpolation of any dataset, assuming the activation function is locally integrable and non-affine. This completes gaps in the literature by extending previous results that considered continuous non-polynomial activations, demonstrating that under these conditions, a dataset with dd distinct points can be interpolated using a neural network with width at least dd in each hidden layer.

Key results also include the characterization of the parameter space capable of interpolation, revealing that the solution space forms an ndn-d dimensional submanifold, where nn is the number of parameters in the network. This manifold characterization intersects with the double descent phenomenon, offering insights into the geometry of the loss landscape at the interpolation threshold.

For the practical numerical resolution of finding interpolation points, the authors introduce a methodology involving the random initialization of input-to-hidden weights and optimization over the output layer. This approach refines previous findings by reducing the needed overparameterization from O(dlog2d)O(d \log^2 d) hidden neurons to O(dlogd)O(d \log d).

Expanding on network density results, the work generalizes the uniform convergence of deep networks over compact sets, showing that deep networks are dense in the space of continuous functions if the activation function is non-affine, irrespective of depth. This aligns with established results while extending them to a broader class of function spaces and neural network architectures.

The paper notably explores the Hessian eigenspectrum at the global minima, establishing that at these interpolation points, the Hessian matrix contains dd positive eigenvalues and ndn-d zero eigenvalues, providing valuable theoretical insights into the curvature of loss landscapes within overparameterized regimes.

The implications of these findings are profound for both theoretical and applied fields. The universality of deep neural networks underlined by this work lays a solid foundation for designing robust predictive models across diverse domains. Moreover, the probabilistic method introduced here for achieving full-rank interpolation matrices could inform new strategies for more efficient neural network training, particularly in non-convex settings.

Future research may explore these theoretical advancements by addressing practical considerations in network training, such as computational efficiency and robustness to noise. Extending these results to various network architectures, including recurrent and convolutional networks, could further elaborate on the robustness and versatility of overparameterized neural networks in real-world applications.