Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods (2310.01618v1)
Abstract: Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for operator equations. Under defined conditions, we present convergence proofs based on fixed point theory. We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning. Empirical assessments highlight that performing iterations through network operators improves performance. We also introduce an iterative graph neural network, PIGN, that further demonstrates benefits of iterations. Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis, potentially guiding the design of future networks with clearer theoretical underpinnings and improved performance.
- Self-supervised learning from images with a joint-embedding predictive architecture. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15619–15629. IEEE Computer Society, 2023.
- Theoretical numerical analysis, volume 39. Springer, 2005.
- Projection and iterated projection methods for nonlinear integral equations. SIAM journal on numerical analysis, 24(6):1352–1373, 1987.
- data2vec: A general framework for self-supervised learning in speech, vision and language. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 1298–1312. PMLR, 17–23 Jul 2022.
- Casp15: Critical assessment of structure prediction. Available online: https://predictioncenter.org/casp15/index.cgi, 2022.
- A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
- Long range graph benchmark. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- Score-based generative modeling with critically-damped langevin diffusion. In International Conference on Learning Representations (ICLR), 2022.
- Protein complex prediction with alphafold-multimer. bioRxiv, 2022.
- Af2complex predicts direct physical interactions in multimeric proteins with deep learning. Nature Communications, 13(1), 4 2022.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, June 2022.
- Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- Inductive representation learning on large graphs. In NIPS, 2017.
- C. Jarzynski. Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach. Phys. Rev. E, 56:5018–5035, Nov 1997.
- Improving peptide-protein docking with alphafold-multimer using forced sampling. Frontiers in Bioinformatics, 2:85, 2022.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Carl T Kelley. Iterative methods for linear and nonlinear equations. SIAM, 1995.
- Neural operator: Learning maps between function spaces. arXiv preprint arXiv:2108.08481, 2021.
- Yu P Krasnosel’skii. Topological methods in the theory of nonlinear integral equations. Pergamon Press, 1964.
- Variational diffusion models. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 21696–21707. Curran Associates, Inc., 2021.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
- Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022.
- Revisiting over-smoothing and over-squashing using ollivier-ricci curvature. In International Conference on Machine Learning, pages 25956–25979. PMLR, 2023.
- Improving language understanding by generative pre-training. 2018.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
- Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Improved protein structure prediction using potentials from deep learning. Nature, 577(7792):706—710, January 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Towards scale-invariant graph-related problem solving by iterative homogeneous gnns. the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
- Graph Attention Networks. International Conference on Learning Representations, 2018. accepted as poster.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- MA Wolfe. Extended iterative methods for the solution of operator equations. Numerische Mathematik, 31:153–174, 1978.
- Neural integral equations. arXiv preprint arXiv:2209.15190, 2022.
- Neural integro-differential equations. Proceedings of AAAI (arXiv:2206.14282), 2023.