When and why PINNs fail to train: A neural tangent kernel perspective (2007.14527v1)

Published 28 Jul 2020 in cs.LG, cs.NA, math.NA, and stat.ML

Abstract: Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. The data and code accompanying this manuscript are publicly available at \url{https://github.com/PredictiveIntelligenceLab/PINNsNTK}.

Citations (746)

View on Semantic Scholar

Summary

The paper establishes that fully-connected PINNs converge to Gaussian processes for linear PDEs in the infinite-width limit.
The paper derives and analyzes the NTK for PINNs, showing it converges to a deterministic kernel that stays constant with a small learning rate.
The paper identifies a spectral bias that hinders learning high-frequency components and proposes an adaptive training strategy to balance convergence.

Analyzing the Training Dynamics of PINNs Through Neural Tangent Kernel

The paper "When and Why PINNs Fail to Train: A Neural Tangent Kernel Perspective" explores the training dynamics of Physics-Informed Neural Networks (PINNs) using Neural Tangent Kernel (NTK) theory. PINNs have shown empirical success in solving forward and inverse problems involving partial differential equations (PDEs), yet their training outcomes remain inconsistent, particularly with high-frequency solutions. This research seeks to deepen the understanding of PINN behavior using NTK, offering insights into the challenges and potential solutions associated with their training.

Key Contributions

PINNs as Gaussian Processes: The paper proves that fully-connected PINNs converge to Gaussian processes in the infinite-width limit when addressing linear PDEs. This lays a theoretical foundation, aligning PINNs with established properties of neural networks in the infinite limit.
Derivation and Analysis of NTK for PINNs: The NTK of PINNs is derived with a demonstration that, under appropriate conditions, it converges to a deterministic kernel and stays constant during training with an infinitesimally small learning rate. This critical insight provides a mathematical framework to analyze PINNs' training dynamics utilizing the spectral properties of their NTK.
Spectral Bias and Convergence Discrepancies: The paper highlights that PINNs face a "spectral bias," where they struggle to learn high-frequency components of functions. Additionally, it identifies a significant disparity in the convergence rates of different components in the loss function, leading to difficulties in training stability and accuracy.
Adaptive Training Strategy: A novel adaptive training strategy is proposed to balance the convergence rates of different components, substantially improving the trainability and predictive accuracy of PINNs. The algorithm adjusts the weights in the PINNs' loss function dynamically, striving for balanced gradient descent paths.

Numerical Results and Implications

The paper supports its theoretical claims through numerical experiments that verify the NTK theory and demonstrate the effectiveness of proposed training algorithms. Results indicate that the NTK of PINNs remains largely unchanged during training for networks with sufficiently large width, aligning with the theoretical predictions. This finding emphasizes the potential of using NTK theory to design more robust PINN architectures and training strategies.

Theoretical and Practical Implications

Theoretically, this paper advances the understanding of the training dynamics in PINNs by integrating NTK insights, helping to elucidate the intricacies of gradient descent paths in the network’s parameter space. Practically, the adaptive training approach offers a concrete method to address one of the significant limitations of PINNs, thereby enabling more effective deployment in complex scientific computing tasks.

Future Directions

Future research could extend these insights to more complex architectures, such as multi-layer networks and convolutional networks, and explore their implications on non-linear PDEs. Additionally, investigating NTK behavior in inverse problem setups and understanding the training dynamics under different optimization routines, like Adam, could further refine the utility and reliability of PINN methodologies.

In conclusion, this work provides foundational advancements in understanding why and when PINNs struggle during training and proposes a pathway to mitigate such issues, offering a promising perspective for the future development of robust scientific machine learning tools.

PDF Markdown