Interplay between depth and width for interpolation in neural ODEs (2401.09902v3)
Abstract: Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilond){-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
- Control on the manifolds of mappings with a view to the deep learning. Journal of Dynamical and Control Systems 28, 989–1008.
- Optimized classification with neural odes via separability. arXiv:2312.13807.
- Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research 18, 1–53.
- Multi-level residual networks from dynamical systems view. arXiv:1710.10348.
- Neural ordinary differential equations, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 6572–6583.
- Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314.
- Neural network approximation. Acta Numerica 30, 327–444.
- Augmented neural odes, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 3140–3150.
- Controllability and optimal control of the transport equation with a localized vector field, in: 2017 25th Mediterranean Conference on Control and Automation (MED), pp. 74–79.
- A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5, 1–11.
- Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters 6, 3152–3157.
- The Power of Depth for Feedforward Neural Networks. JMLR: Workshop and Conference Proceedings 49, 1–34.
- Large-time asymptotics in deep learning. arXiv:2008.02491.
- Sparsity in long-time control of neural ODEs. Systems Control Lett. 172, Paper No. 105452, 14.
- Quasi-equivalence of width and depth of neural networks.
- FFJORD: free-form continuous dynamics for scalable reversible generative models, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans.
- Stable architectures for deep neural networks. Inverse Problems 34, 014004.
- Identity matters in deep learning, in: International Conference on Learning Representations.
- Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks 14, 274–281.
- Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3964–3979.
- Deep learning via dynamical systems: An approximation perspective. Journal of the European Mathematical Society 25, 1671–1709.
- Resnet with one-neuron hidden layers is a universal approximator, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, p. 6172–6181.
- The expressive power of neural networks: A view from the width, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA. p. 6232–6240.
- When and why are deep networks better than shallow ones?, in: Proceedings of the AAAI conference on artificial intelligence.
- Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22.
- Approximation theory of the mlp model in neural networks. Acta Numerica 8, 143–195.
- Neural ODE control for classification, approximation, and transport. SIAM Rev. 65, 735–773.
- Control of neural transport for normalising flows. Journal de Mathématiques Pures et Appliquées 181, 58–90.
- Differential equations for continuous-time deep learning. arXiv:2401.03965.
- Deep learning approximation of diffeomorphisms via linear-control systems. Mathematical Control and Related Fields 13, 1226–1257.
- Universal approximation power of deep residual neural networks through the lens of control. IEEE Transactions on Automatic Control 68, 2715–2728.
- A lipschitz condition preserving extension for a vector function. American Journal of Mathematics 67, 83–93.
- Optimal transport – Old and new. Springer Berlin, Heidelberg. volume 338. pp. xxii+973.
- Small relu networks are powerful memorizers: A tight analysis of memorization capacity, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 15558–15569.
- Understanding deep learning requires rethinking generalization. Communications of the ACM 64.
- Antonio Álvarez-López (4 papers)
- Arselane Hadj Slimane (1 paper)
- Enrique Zuazua (102 papers)