Challenges in Training PINNs: A Loss Landscape Perspective (2402.01868v2)
Abstract: This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.
- What Can ResNet Learn Efficiently, Going Beyond Kernels? In Advances in Neural Information Processing Systems, 2019.
- AdaGrad Avoids Saddle Points. In Proceedings of the 39th International Conference on Machine Learning, 2022.
- Oracle complexity of second-order methods for smooth convex optimization. Mathematical Programming, 178:327–360, 2019.
- Bach, F. Sharp analysis of low-rank kernel matrix approximations. In Conference on learning theory, 2013.
- Belkin, M. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30:203–248, 2021.
- On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM Journal on Optimization, 20(6):2833–2852, 2010.
- On Lazy Training in Differentiable Programming. In Advances in Neural Information Processing Systems, 2019.
- Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017.
- Scientific Machine Learning Through Physics–Informed Neural Networks: Where We Are and What’s Next. J. Sci. Comput., 92(3), 2022.
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems, 2014.
- On the approximation of functions by tanh neural networks. Neural Networks, 143:732–750, 2021.
- An operator preconditioning perspective on training in physics-informed machine learning. arXiv preprint arXiv:2310.05801, 2023.
- A simple convergence proof of Adam and Adagrad. Transactions on Machine Learning Research, 2022.
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
- The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
- Randomized Nyström Preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2):718–752, 2023.
- An Investigation into Neural Net Optimization via Hessian Eigenvalue Density. In Proceedings of the 36th International Conference on Machine Learning, 2019.
- When Do Neural Networks Outperform Kernel Methods? In Advances in Neural Information Processing Systems, 2020.
- Linearized two-layers neural networks in high dimension. The Annals of Statistics, 49(2):1029–1054, 2021.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
- Matrices, moments and quadrature with applications, volume 30. Princeton University Press, 2009.
- PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs. arXiv preprint arXiv:2306.08827, 2023.
- Matrix Analysis. Cambridge University Press, 2nd edition, 2012.
- Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks, 3(5):551–560, 1990.
- Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Communications in Computational Physics, 28(5):2002–2041, 2020.
- Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404:109136, 2020a.
- Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020b.
- Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020c.
- Linear Convergence of Gradient and Proximal-Gradient Methods under the Polyak-Łojasiewicz Condition. In Machine Learning and Knowledge Discovery in Databases, 2016.
- Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
- hp-VPINNs: Variational physics-informed neural networks with domain decomposition. Computer Methods in Applied Mechanics and Engineering, 374:113547, 2021.
- VarNet: Variational Neural Networks for the Solution of Partial Differential Equations. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, pp. 298–307, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Characterizing possible failure modes in physics-informed neural networks. In Advances in Neural Information Processing Systems, 2021.
- First-order methods almost always avoid strict saddle points. Mathematical Programming, 176(1):311–337, 2019.
- D3M: A Deep Domain Decomposition Method for Partial Differential Equations. IEEE Access, 8:5283–5294, 2020.
- Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations, 2021.
- Approximating spectral densities of large matrices. SIAM review, 58(1):34–65, 2016.
- On the linearity of large non-linear models: when and why the tangent kernel is constant. Advances in Neural Information Processing Systems, 2020.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022.
- Aiming towards the minimizers: fast convergence of SGD for overparametrized problems. arXiv preprint arXiv:2306.02601, 2023.
- On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
- Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021a.
- DeepXDE: A Deep Learning Library for Solving Differential Equations. SIAM Review, 63(1):208–228, 2021b.
- Physics-informed neural networks with hard constraints for inverse design. SIAM Journal on Scientific Computing, 43(6):B1105–B1132, 2021c.
- Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Physical Review Research, 4(2):023210, 2022.
- Estimates on the generalization error of physics-informed neural networks for approximating pdes. IMA Journal of Numerical Analysis, 43(1):1–43, 2023.
- Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations. Advances in Computational Mathematics, 49(4):62, 2023.
- Achieving High Accuracy with PINNs via Energy Natural Gradient Descent. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- Efficient training of physics‐informed neural networks via importance sampling. Comput.-Aided Civ. Infrastruct. Eng., 36(8):962–977, 2021.
- Nesterov, Y. Lectures on Convex Optimization. Springer Publishing Company, Incorporated, 2nd edition, 2018.
- Numerical Optimization. Springer, 2nd edition, 2006.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv preprint arXiv:1912.01703, 2019.
- Pearlmutter, B. A. Fast exact multiplication by the hessian. Neural computation, 6(1):147–160, 1994.
- Polyak, B. T. Gradient methods for minimizing functionals. Zhurnal vychislitel’noi matematiki i matematicheskoi fiziki, 3(4):643–653, 1963.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
- FALKON: An Optimal Large Scale Kernel Method. In Advances in Neural Information Processing Systems, 2017.
- Tropp, J. A. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
- Learning Specialized Activation Functions for Physics-Informed Neural Networks. Communications in Computational Physics, 34(4):869–906, 2023.
- Understanding and Mitigating Gradient Flow Pathologies in Physics-Informed Neural Networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021a.
- On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938, 2021b.
- Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Science Advances, 7(40):eabi8605, 2021c.
- Respecting causality is all you need for training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022a.
- When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022b.
- A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023a.
- Effective data sampling strategies and boundary condition constraints of physics-informed neural networks for identifying material properties in solid mechanics. Applied mathematics and mechanics, 44(7):1039–1068, 2023b.
- MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- PyHessian: Neural Networks Through the Lens of the Hessian. In 2020 IEEE International Conference on Big Data (Big Data), 2020.
- Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems. Computer Methods in Applied Mechanics and Engineering, 393:114823, 2022.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Adam Can Converge Without Any Modification On Update Rules. In Advances in Neural Information Processing Systems, 2022.