Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of the Geometric Structure of Neural Networks and Neural ODEs via Morse Functions (2405.09351v2)

Published 15 May 2024 in math.DS and cs.NE

Abstract: Besides classical feed-forward neural networks, also neural ordinary differential equations (neural ODEs) have gained particular interest in recent years. Neural ODEs can be interpreted as an infinite depth limit of feed-forward or residual neural networks. We study the input-output dynamics of finite and infinite depth neural networks with scalar output. In the finite depth case, the input is a state associated with a finite number of nodes, which maps under multiple non-linear transformations to the state of one output node. In analogy, a neural ODE maps an affine linear transformation of the input to an affine linear transformation of its time-$T$ map. We show that depending on the specific structure of the network, the input-output map has different properties regarding the existence and regularity of critical points, which can be characterized via Morse functions. We prove that critical points cannot exist if the dimension of the hidden layer is monotonically decreasing or the dimension of the phase space is smaller or equal to the input dimension. In the case that critical points exist, we classify their regularity depending on the specific architecture of the network. We show that except for a Lebesgue measure zero set in the weight space, each critical point is non-degenerate, if for finite depth neural networks the underlying graph has no bottleneck, and if for neural ODEs, the affine linear transformations used have full rank. For each type of architecture, the proven properties are comparable in the finite and the infinite depth case. The established theorems allow us to formulate results on universal embedding, i.e., on the exact representation of maps by neural networks and neural ODEs. Our dynamical systems viewpoint on the geometric structure of the input-output map provides a fundamental understanding of why certain architectures perform better than others.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. C. C. Aggarwal. Neural Networks and Deep Learning. Springer, 1 edition, 2018. doi:10.1007/978-3-319-94463-0.
  2. Neural ordinary differential equations. 2018. doi:10.48550/ARXIV.1806.07366.
  3. C. Chicone. Ordinary Differential Equations with Applications, volume 34 of Texts in Applied Mathematics. Springer New York, 2 edition, 2006. doi:10.1007/0-387-35794-7.
  4. Neural field models: A mathematical overview and unifying framework. Mathematical Neuroscience and Applications, Volume 2, 2022. doi:10.46298/mna.7284.
  5. Large-time asymptotics in deep learning. 2020. doi:10.48550/ARXIV.2008.02491.
  6. O. Forster. Analysis 2, Differentialrechnung im ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, gewöhnliche Differentialgleichungen. Grundkurs Mathematik. Springer Spektrum, 11 edition, 2017. doi:10.1007/978-3-658-19411-6.
  7. J. Guckenheimer and P. Holmes. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, volume 42 of Applied Mathematical Sciences. Springer New York, 7 edition, 2002. doi:10.1007/978-1-4612-1140-2.
  8. J. K. Hale. Ordinary Differential Equations. Krieger Publishing Company, 2 edition, 1980.
  9. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, jun 2016. doi:10.1109/cvpr.2016.90.
  10. M. W. Hirsch. Differential Topology, volume 33 of Graduate Texts in Mathematics. Springer New York, 1976. doi:10.1007/978-1-4684-9449-5.
  11. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, jan 1989. doi:10.1016/0893-6080(89)90020-8.
  12. P. Kidger. On Neural Differential Equations. PhD thesis, Mathematical Institute, University of Oxford, 2022. doi:10.48550/ARXIV.2202.02435.
  13. A. Kratsios. The universal approximation property - characterization, construction, representation, and existence. Annals of Mathematics and Artificial Intelligence, 89(5-6):435–469, jan 2021. doi:10.1007/s10472-020-09723-1.
  14. C. Kuehn. PDE Dynamics: An Introduction. Society for Industrial and Applied Mathematics, 2019. doi:10.1137/1.9781611975666.
  15. C. Kuehn and S.-V. Kuntz. Embedding capabilities of neural ODEs. 2023. doi:10.48550/ARXIV.2308.01213.
  16. S. V. Kurochkin. Neural network with smooth activation functions and without bottlenecks is almost surely a morse function. Computational Mathematics and Mathematical Physics, 61(7):1162–1168, 2021. doi:10.1134/s0965542521070101.
  17. H. Lin and S. Jegelka. ResNet with one-neuron hidden layers is a universal approximator. Advances in Neural Information Processing Systems, 31:6169–6178, 2018. doi:10.48550/ARXIV.1806.10909.
  18. J. R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, 3 edition, 2019. doi:10.1002/9781119541219.
  19. M. Morse. The Calculus of Variations in the Large, volume 18 of Colloquium Publications. American Mathematical Society, 1934.
  20. L. Nicolaescu. An Invitation to Morse Theory. Springer New York, 2011. doi:10.1007/978-1-4614-1105-5.
  21. A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, jan 1999. doi:10.1017/s0962492900002919.
  22. V. Prasolov. Elements of Combinatorial and Differential Topology. American Mathematical Society, 2006. doi:10.1090/gsm/074.
  23. F. Rosenblatt. The perceptron - a perceiving and recognizing automaton. Cornell Aeronautical Laboratory, INC., Buffalo, New York, (85-460-1), 1957.
  24. Recurrent Neural Networks Are Universal Approximators, pages 632–640. Springer Berlin Heidelberg, 2006. doi:10.1007/11840817_66.
  25. Linear Algebra and Geometry. Springer Berlin Heidelberg, 2013. doi:10.1007/978-3-642-30994-6.
  26. N. Thome. Inequalities and equalities for l=2 (sylvester), l=3 (frobenius), and l¿3 matrices. Aequationes mathematicae, 90(5):951–960, 2016. doi:10.1007/s00010-016-0412-4.
  27. E. Weinan. A proposal on machine learning via dynamical systems. Commun. Math. Stat, 5:1–11, mar 2017. doi:10.1007/s40304-017-0103-z.
  28. Approximation capabilities of neural odes and invertible residual networks. Proceedings of the 37th International Conference on Machine Learning, 119:11086–11095, 2020. doi:10.48550/ARXIV.1907.12998.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com