Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalization bounds for neural ordinary differential equations and deep residual networks (2305.06648v2)

Published 11 May 2023 in stat.ML and cs.LG

Abstract: Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. V. Arnold. Ordinary Differential Equations. Springer Textbook. Springer Berlin Heidelberg, 1992.
  2. F. Bach. Learning theory from first principles, 2023. Book draft. URL: https://www.di.ens.fr/~fbach/ltfp_book.pdf (version: 2023-02-05).
  3. Spectrally-normalized margin bounds for neural networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  4. Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20(63):1–17, 2019.
  5. Pay attention to your loss : understanding misconceptions about lipschitz neural networks. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, volume 35. Curran Associates, Inc., 2022.
  6. Neural flows: Efficient alternative to neural ODEs. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  7. Neural ordinary differential equations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  8. Scaling properties of deep residual networks. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 2039–2048. PMLR, 2021.
  9. P. Deuflhard and S. Röblitz. Parameter identification in ODE models. In A Guide to Numerical Modelling in Systems Biology, pages 89–138. Springer International Publishing, 2015.
  10. W. E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017. ISSN 2194-671X.
  11. Framing RNN as a kernel method: A neural ODE approach. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  12. Robust pricing and hedging via neural sdes. arXiv preprint arXiv:2007.04154, 2020.
  13. Size-independent sample complexity of neural networks. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research. PMLR, 2018.
  14. Efficient regression in metric spaces via approximate lipschitz extension. IEEE Transactions on Information Theory, 63(8):4838–4849, 2017.
  15. E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse problems, 34(1):014004, 2017.
  16. J. Hanson and M. Raginsky. Fitting an immersed submanifold to data via sussmann’s orbit theorem. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 5323–5328, 2022.
  17. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016a.
  18. Identity mappings in deep residual networks. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision – ECCV 2016, pages 630–645. Springer International Publishing, 2016b.
  19. A variational perspective on diffusion-based generative models and score matching. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  20. Learning differential equations that are easy to solve. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 2020.
  21. Neural controlled differential equations for irregular time series. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 2020.
  22. D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015.
  23. A. Kolmogorov and V. Tikhomirov. ε𝜀\varepsilonitalic_ε-entropy and ε𝜀\varepsilonitalic_ε-capacity of sets in functional spaces. Uspekhi Mat. Nauk, 14:3–86, 1959.
  24. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021.
  25. Noisy recurrent neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  26. Neural-ODE for pharmacokinetics modeling and its advantage to alternative machine learning models in predicting new dosing regimens. iScience, 24(7):102804, 2021.
  27. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. arXiv preprint arXiv:1710.10121, 2017.
  28. J. Luk. Notes on existence and uniqueness theorems for ODEs, 2017. URL: http://web.stanford.edu/~jluk/math63CMspring17/Existence.170408.pdf (version: 2023-01-19).
  29. Scaling ResNets in the large-depth regime. arXiv preprint arXiv:2206.06929, 2022.
  30. Implicit regularization of deep residual networks towards neural odes. arXiv preprint arXiv:2309.01213, 2023.
  31. Dissecting neural ODEs. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 2020.
  32. Norm-based capacity control in neural networks. In P. Grünwald, E. Hazan, and S. Kale, editors, Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages 1376–1401, Paris, France, 03–06 Jul 2015. PMLR.
  33. A PAC-bayesian approach to spectrally-normalized margin bounds for neural networks. In International Conference on Learning Representations, 2018.
  34. Estimating lipschitz constants of monotone deep equilibrium models. In International Conference on Learning Representations, 2021.
  35. B. G. Pachpatte and W. Ames. Inequalities for Differential and Integral Equations. Elsevier, 1997.
  36. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  37. Integrating expert ODEs into neural ODEs: Pharmacology and disease progression. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  38. Continuous-in-depth neural networks. arXiv preprint arXiv:2008.02389, 2020.
  39. Stateful ODE-nets using basis function expansions. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  40. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  41. Do residual neural networks discretize neural ordinary differential equations? In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 35. Curran Associates, Inc., 2022.
  42. M. Thorpe and Y. van Gennip. Deep limits of residual neural networks. Research in the Mathematical Sciences, 10(1):6, 2022.
  43. B. Tzen and M. Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019.
  44. M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019.
  45. LEADS: Learning dynamical systems that generalize across environments. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
  46. Neural generalized ordinary differential equations with layer-varying parameters. arXiv preprint arXiv:2209.10633, 2022.
  47. Urban flow prediction with spatial–temporal neural ODEs. Transportation Research Part C: Emerging Technologies, 124:102912, 2021.
Citations (16)

Summary

We haven't generated a summary for this paper yet.