Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interplay between depth and width for interpolation in neural ODEs (2401.09902v3)

Published 18 Jan 2024 in math.OC and cs.LG

Abstract: Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilond){-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Control on the manifolds of mappings with a view to the deep learning. Journal of Dynamical and Control Systems 28, 989–1008.
  2. Optimized classification with neural odes via separability. arXiv:2312.13807.
  3. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research 18, 1–53.
  4. Multi-level residual networks from dynamical systems view. arXiv:1710.10348.
  5. Neural ordinary differential equations, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 6572–6583.
  6. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314.
  7. Neural network approximation. Acta Numerica 30, 327–444.
  8. Augmented neural odes, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 3140–3150.
  9. Controllability and optimal control of the transport equation with a localized vector field, in: 2017 25th Mediterranean Conference on Control and Automation (MED), pp. 74–79.
  10. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5, 1–11.
  11. Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters 6, 3152–3157.
  12. The Power of Depth for Feedforward Neural Networks. JMLR: Workshop and Conference Proceedings 49, 1–34.
  13. Large-time asymptotics in deep learning. arXiv:2008.02491.
  14. Sparsity in long-time control of neural ODEs. Systems Control Lett. 172, Paper No. 105452, 14.
  15. Quasi-equivalence of width and depth of neural networks.
  16. FFJORD: free-form continuous dynamics for scalable reversible generative models, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans.
  17. Stable architectures for deep neural networks. Inverse Problems 34, 014004.
  18. Identity matters in deep learning, in: International Conference on Learning Representations.
  19. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks 14, 274–281.
  20. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3964–3979.
  21. Deep learning via dynamical systems: An approximation perspective. Journal of the European Mathematical Society 25, 1671–1709.
  22. Resnet with one-neuron hidden layers is a universal approximator, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, p. 6172–6181.
  23. The expressive power of neural networks: A view from the width, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA. p. 6232–6240.
  24. When and why are deep networks better than shallow ones?, in: Proceedings of the AAAI conference on artificial intelligence.
  25. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22.
  26. Approximation theory of the mlp model in neural networks. Acta Numerica 8, 143–195.
  27. Neural ODE control for classification, approximation, and transport. SIAM Rev. 65, 735–773.
  28. Control of neural transport for normalising flows. Journal de Mathématiques Pures et Appliquées 181, 58–90.
  29. Differential equations for continuous-time deep learning. arXiv:2401.03965.
  30. Deep learning approximation of diffeomorphisms via linear-control systems. Mathematical Control and Related Fields 13, 1226–1257.
  31. Universal approximation power of deep residual neural networks through the lens of control. IEEE Transactions on Automatic Control 68, 2715–2728.
  32. A lipschitz condition preserving extension for a vector function. American Journal of Mathematics 67, 83–93.
  33. Optimal transport – Old and new. Springer Berlin, Heidelberg. volume 338. pp. xxii+973.
  34. Small relu networks are powerful memorizers: A tight analysis of memorization capacity, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc.. p. 15558–15569.
  35. Understanding deep learning requires rethinking generalization. Communications of the ACM 64.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets