Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stable Invariant Models via Koopman Spectra (2207.07475v2)

Published 15 Jul 2022 in cs.LG

Abstract: Weight-tied models have attracted attention in the modern development of neural networks. The deep equilibrium model (DEQ) represents infinitely deep neural networks with weight-tying, and recent studies have shown the potential of this type of approach. DEQs are needed to iteratively solve root-finding problems in training and are built on the assumption that the underlying dynamics determined by the models converge to a fixed point. In this paper, we present the stable invariant model (SIM), a new class of deep models that in principle approximates DEQs under stability and extends the dynamics to more general ones converging to an invariant set (not restricted in a fixed point). The key ingredient in deriving SIMs is a representation of the dynamics with the spectra of the Koopman and Perron--Frobenius operators. This perspective approximately reveals stable dynamics with DEQs and then derives two variants of SIMs. We also propose an implementation of SIMs that can be learned in the same way as feedforward models. We illustrate the empirical performance of SIMs with experiments and demonstrate that SIMs achieve comparative or superior performance against DEQs in several learning tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In IEEE First International Conference on Neural Networks (pp. 609–618).
  2. Adaptive input representations for neural language modeling. In 7th International Conference on Learning Representations.
  3. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
  4. Deep equilibrium models. In Advances in Neural Information Processing Systems 32 (pp. 688–699).
  5. Trellis networks for sequence modeling. In 7th International Conference on Learning Representations.
  6. Multiscale deep equilibrium models. In Advances in Neural Information Processing Systems 33 (pp. 5238–5250).
  7. Stabilizing equilibrium models by Jacobian regularization. In Proceedings of the 38th International Conference on Machine Learning (pp. 554–565).
  8. Identifying almost invariant sets in stochastic dynamical systems. Chaos, 18.
  9. AntisymmetricRNN: A dynamical system view on recurrent neural networks. In 7th International Conference on Learning Representations.
  10. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31 (pp. 6572–6583).
  11. Chaos: Classical and Quantum. ChaosBook.org. Niels Bohr Institute, Copenhagen.
  12. Recurrent stacking of layers for compact neural machine translation models. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (pp. 6292–6299).
  13. Universal transformers. In 7th International Conference on Learning Representations.
  14. Optimizing neural networks via Koopman operator theory. In Advances in Neural Information Processing Systems 33 (pp. 2087–2097).
  15. Almost-invariant sets and invariant manifolds—connecting probabilistic and geometric descriptions of coherent structures in flows. Physica D: Nonlinear Phenomena, 238, 1507–1523.
  16. Gaspard, P. (1998). Chaos, Scattering and Statistical Mechanics volume 9 of Cambridge Nonlinear Science Series. Cambridge University Press.
  17. A new model for learning in graph domains. In Proceedings of 2005 IEEE International Joint Conference on Neural Networks, vol. 2 (pp. 729–734).
  18. Efficient softmax approximation for GPUs. In Proceedings of the 34th International Conference on Machine Learning (pp. 1302–1310).
  19. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
  20. Long short-term memory. Neural Computation, 9, 1735–1780.
  21. RNNs incrementally evolving on an equilibrium manifold: A panacea for vanishing and exploding gradients? In 8th International Conference on Learning Representations.
  22. Kawaguchi, K. (2021). On the theory of implicit deep learning: Global convergence with implicit layers. In 9th International Conference on Learning Representations.
  23. Kawahara, Y. (2016). Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. In Advances in Neural Information Processing Systems 29 (pp. 911–919).
  24. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations.
  25. Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences of the United States of America, 17, 315–318.
  26. The Implicit Function Theorem: History, Theory, and Applications. Modern Birkhäusr Classics. Birkhäusr.
  27. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images.
  28. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics volume 97 of Applied Mathematical Sciences. Springer New York, NY.
  29. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.
  30. A system for massively parallel hyperparameter tuning. In Proceedings of Machine Learning and Systems (pp. 230–246). volume 2.
  31. Reviving and improving recurrent back-propagation. In Proceedings of the 35th International Conference on Machine Learning (pp. 3082–3091).
  32. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
  33. Stable and expressive recurrent vision models. In Advances in Neural Information Processing Systems 33 (pp. 10456–10467).
  34. Learning stable deep dynamics models. In Advances in Neural Information Processing Systems 32 (pp. 10718–10728).
  35. Applications of Koopman mode analysis to neural networks. arXiv preprint arXiv:2006.11765.
  36. The Koopman Operator in Systems and Control. Springer International Publishing.
  37. Mezić, I. (2005). Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41, 309–325.
  38. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814).
  39. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
  40. Estimating Lipschitz constants of monotone deep equilibrium models. In 9th International Conference on Learning Representations.
  41. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22, 1–64.
  42. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035).
  43. Pineda, F. J. (1987). Generalization of back-propagation to recurrent neural networks. Physical review letters, 59, 2229–2232.
  44. Generalizing Koopman theory to allow for inputs and control. SIAM J. Appl. Dyn. Syst., 17, 909–930.
  45. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20 (pp. 1177–1184).
  46. Spectral analysis of nonlinear flows. Journal of Fluid Mechanics, 641, 115–127.
  47. The graph neural network model. IEEE Transactions on Neural Networks, 20, 61–80.
  48. Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, 656, 5–28.
  49. Oscillatory neural networks. Annual Review of Physiology, 47, 29–48.
  50. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations.
  51. Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems 33 (pp. 7462–7473).
  52. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15, 1929–1958.
  53. Strogatz, S. H. (2015). Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering, Second Edition. CRC press.
  54. Learning dynamics models with stable invariant sets. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (pp. 9782–9790).
  55. Bayesian dynamic mode decomposition. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2814–2821).
  56. Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems 30 (pp. 1130–1140).
  57. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems 33 (pp. 7537–7547).
  58. Existence and learning of oscillations in recurrent neural networks. IEEE Transactions on Neural Networks, 11, 205–214.
  59. Attention is all you need. In Advances in Neural Information Processing Systems 30 (pp. 5998–6008).
  60. A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25, 1307–1346.
  61. Monotone operator equilibrium networks. In Advances in Neural Information Processing Systems 33 (pp. 10718–10728).
  62. Optimization induced equilibrium networks. arXiv preprint arXiv:2105.13228.
Citations (3)

Summary

We haven't generated a summary for this paper yet.