Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Challenges in Training PINNs: A Loss Landscape Perspective (2402.01868v2)

Published 2 Feb 2024 in cs.LG, math.OC, and stat.ML

Abstract: This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. What Can ResNet Learn Efficiently, Going Beyond Kernels? In Advances in Neural Information Processing Systems, 2019.
  2. AdaGrad Avoids Saddle Points. In Proceedings of the 39th International Conference on Machine Learning, 2022.
  3. Oracle complexity of second-order methods for smooth convex optimization. Mathematical Programming, 178:327–360, 2019.
  4. Bach, F. Sharp analysis of low-rank kernel matrix approximations. In Conference on learning theory, 2013.
  5. Belkin, M. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30:203–248, 2021.
  6. On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM Journal on Optimization, 20(6):2833–2852, 2010.
  7. On Lazy Training in Differentiable Programming. In Advances in Neural Information Processing Systems, 2019.
  8. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017.
  9. Scientific Machine Learning Through Physics–Informed Neural Networks: Where We Are and What’s Next. J. Sci. Comput., 92(3), 2022.
  10. Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
  11. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems, 2014.
  12. On the approximation of functions by tanh neural networks. Neural Networks, 143:732–750, 2021.
  13. An operator preconditioning perspective on training in physics-informed machine learning. arXiv preprint arXiv:2310.05801, 2023.
  14. A simple convergence proof of Adam and Adagrad. Transactions on Machine Learning Research, 2022.
  15. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  16. The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
  17. Randomized Nyström Preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2):718–752, 2023.
  18. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density. In Proceedings of the 36th International Conference on Machine Learning, 2019.
  19. When Do Neural Networks Outperform Kernel Methods? In Advances in Neural Information Processing Systems, 2020.
  20. Linearized two-layers neural networks in high dimension. The Annals of Statistics, 49(2):1029–1054, 2021.
  21. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
  22. Matrices, moments and quadrature with applications, volume 30. Princeton University Press, 2009.
  23. PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs. arXiv preprint arXiv:2306.08827, 2023.
  24. Matrix Analysis. Cambridge University Press, 2nd edition, 2012.
  25. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks, 3(5):551–560, 1990.
  26. Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Communications in Computational Physics, 28(5):2002–2041, 2020.
  27. Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404:109136, 2020a.
  28. Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020b.
  29. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020c.
  30. Linear Convergence of Gradient and Proximal-Gradient Methods under the Polyak-Łojasiewicz Condition. In Machine Learning and Knowledge Discovery in Databases, 2016.
  31. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  32. hp-VPINNs: Variational physics-informed neural networks with domain decomposition. Computer Methods in Applied Mechanics and Engineering, 374:113547, 2021.
  33. VarNet: Variational Neural Networks for the Solution of Partial Differential Equations. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, pp.  298–307, 2020.
  34. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  35. Characterizing possible failure modes in physics-informed neural networks. In Advances in Neural Information Processing Systems, 2021.
  36. First-order methods almost always avoid strict saddle points. Mathematical Programming, 176(1):311–337, 2019.
  37. D3M: A Deep Domain Decomposition Method for Partial Differential Equations. IEEE Access, 8:5283–5294, 2020.
  38. Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations, 2021.
  39. Approximating spectral densities of large matrices. SIAM review, 58(1):34–65, 2016.
  40. On the linearity of large non-linear models: when and why the tangent kernel is constant. Advances in Neural Information Processing Systems, 2020.
  41. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022.
  42. Aiming towards the minimizers: fast convergence of SGD for overparametrized problems. arXiv preprint arXiv:2306.02601, 2023.
  43. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
  44. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021a.
  45. DeepXDE: A Deep Learning Library for Solving Differential Equations. SIAM Review, 63(1):208–228, 2021b.
  46. Physics-informed neural networks with hard constraints for inverse design. SIAM Journal on Scientific Computing, 43(6):B1105–B1132, 2021c.
  47. Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Physical Review Research, 4(2):023210, 2022.
  48. Estimates on the generalization error of physics-informed neural networks for approximating pdes. IMA Journal of Numerical Analysis, 43(1):1–43, 2023.
  49. Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations. Advances in Computational Mathematics, 49(4):62, 2023.
  50. Achieving High Accuracy with PINNs via Energy Natural Gradient Descent. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  51. Efficient training of physics‐informed neural networks via importance sampling. Comput.-Aided Civ. Infrastruct. Eng., 36(8):962–977, 2021.
  52. Nesterov, Y. Lectures on Convex Optimization. Springer Publishing Company, Incorporated, 2nd edition, 2018.
  53. Numerical Optimization. Springer, 2nd edition, 2006.
  54. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv preprint arXiv:1912.01703, 2019.
  55. Pearlmutter, B. A. Fast exact multiplication by the hessian. Neural computation, 6(1):147–160, 1994.
  56. Polyak, B. T. Gradient methods for minimizing functionals. Zhurnal vychislitel’noi matematiki i matematicheskoi fiziki, 3(4):643–653, 1963.
  57. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  58. FALKON: An Optimal Large Scale Kernel Method. In Advances in Neural Information Processing Systems, 2017.
  59. Tropp, J. A. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
  60. Learning Specialized Activation Functions for Physics-Informed Neural Networks. Communications in Computational Physics, 34(4):869–906, 2023.
  61. Understanding and Mitigating Gradient Flow Pathologies in Physics-Informed Neural Networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021a.
  62. On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938, 2021b.
  63. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Science Advances, 7(40):eabi8605, 2021c.
  64. Respecting causality is all you need for training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022a.
  65. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022b.
  66. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023a.
  67. Effective data sampling strategies and boundary condition constraints of physics-informed neural networks for identifying material properties in solid mechanics. Applied mathematics and mechanics, 44(7):1039–1068, 2023b.
  68. MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  69. PyHessian: Neural Networks Through the Lens of the Hessian. In 2020 IEEE International Conference on Big Data (Big Data), 2020.
  70. Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems. Computer Methods in Applied Mechanics and Engineering, 393:114823, 2022.
  71. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  72. Adam Can Converge Without Any Modification On Update Rules. In Advances in Neural Information Processing Systems, 2022.
Citations (21)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com