B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data
(2003.06097v1)
Published 13 Mar 2020 in stat.ML and cs.LG
Abstract: We propose a Bayesian physics-informed neural network (B-PINN) to solve both forward and inverse nonlinear problems described by partial differential equations (PDEs) and noisy data. In this Bayesian framework, the Bayesian neural network (BNN) combined with a PINN for PDEs serves as the prior while the Hamiltonian Monte Carlo (HMC) or the variational inference (VI) could serve as an estimator of the posterior. B-PINNs make use of both physical laws and scattered noisy measurements to provide predictions and quantify the aleatoric uncertainty arising from the noisy data in the Bayesian framework. Compared with PINNs, in addition to uncertainty quantification, B-PINNs obtain more accurate predictions in scenarios with large noise due to their capability of avoiding overfitting. We conduct a systematic comparison between the two different approaches for the B-PINN posterior estimation (i.e., HMC or VI), along with dropout used for quantifying uncertainty in deep neural networks. Our experiments show that HMC is more suitable than VI for the B-PINNs posterior estimation, while dropout employed in PINNs can hardly provide accurate predictions with reasonable uncertainty. Finally, we replace the BNN in the prior with a truncated Karhunen-Lo`eve (KL) expansion combined with HMC or a deep normalizing flow (DNF) model as posterior estimators. The KL is as accurate as BNN and much faster but this framework cannot be easily extended to high-dimensional problems unlike the BNN based framework.
The paper "B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data" (Yang et al., 2020) introduces a Bayesian framework for solving partial differential equations (PDEs) and inverse problems involving noisy observation data. Standard physics-informed neural networks (PINNs) integrate PDE constraints into the training process but typically lack built-in mechanisms for quantifying uncertainty, especially when dealing with noise. They are also susceptible to overfitting noisy data. B-PINNs address these limitations by adopting a Bayesian approach to estimate the distribution of the solution and unknown parameters.
The core idea is to formulate the problem within a Bayesian framework where a neural network acts as a flexible surrogate model for the unknown solution u(x;θ) (and potentially unknown PDE parameters λ), parameterized by θ. The physical laws (PDEs and boundary conditions) and the scattered noisy data (D) are combined to define the likelihood function. The parameters θ (and λ for inverse problems) are treated as random variables with prior distributions. The goal is to compute the posterior distribution P(θ∣D) (or P(θ,λ∣D)).
The surrogate model for the solution u(x) is typically a fully-connected neural network u~(x;θ), where θ represents all the weights and biases. For inverse problems, if PDE parameters λ are unknown constants, they are simply added to the set of parameters to be inferred. If λ is a field, another surrogate model could be used for it.
The likelihood function P(D∣θ) quantifies how well the model prediction u~(x;θ) and its derivatives (which form the PDE residual f~ and boundary terms b~) match the noisy observations. Assuming independent Gaussian noise for observed data points Du, Df, and Db with known standard deviations σu(i),σf(i),σb(i), the likelihood is formulated as a product of Gaussian probability density functions:
The PDE residuals f~ and boundary terms b~ are computed directly from u~ using automatic differentiation, integrating the physics into the likelihood calculation.
A common prior choice for the BNN parameters θ is independent Gaussian distributions for each weight and bias, often centered at zero. The paper notes that while BNNs with infinite width approach Gaussian Processes (GPs), for finite width, the distributions of the derivatives might deviate from Gaussian, unlike the derivatives of a true GP.
Estimating the posterior distribution P(θ∣D)∝P(D∣θ)P(θ) is computationally challenging. The paper explores two primary methods:
Hamiltonian Monte Carlo (HMC): A Markov Chain Monte Carlo (MCMC) method that generates samples from the target posterior distribution by simulating Hamiltonian dynamics. It is considered a state-of-the-art method for accurate posterior sampling but can be computationally intensive, especially for high-dimensional parameter spaces. Implementation involves defining a potential energy U(θ)=−lnP(D∣θ)−lnP(θ) and simulating trajectories using numerical integration (like the leapfrog method), followed by a Metropolis-Hastings acceptance step to correct for discretization errors. Algorithm 1 provides a practical outline of the HMC procedure. After a burn-in phase, collected samples are used to estimate posterior statistics (mean, standard deviation).
Variational Inference (VI): This approach approximates the true posterior P(θ∣D) with a simpler, parameterized distribution Q(θ;ζ), typically a factorized Gaussian. The parameters ζ of Q are optimized to minimize the Kullback-Leibler (KL) divergence between Q and P. Algorithm 2 outlines the VI training process using an optimizer like Adam, minimizing a proxy for the KL divergence (related to the Evidence Lower Bound - ELBO). Samples are then drawn directly from the learned distribution Q. VI is generally faster than HMC but might yield less accurate posterior approximations if the chosen family of distributions for Q is too restrictive.
The paper compares these B-PINN approaches (B-PINN-HMC, B-PINN-VI) against a baseline non-Bayesian method, PINNs with Dropout, used for uncertainty quantification. Dropout involves randomly dropping neurons during training and prediction; uncertainty is estimated by running multiple forward passes with dropout enabled after training.
Experiments on function regression and various forward/inverse PDE problems (1D linear/nonlinear, 2D nonlinear) with synthetic noisy data demonstrate:
B-PINN-HMC: Consistently provides accurate mean predictions and reliable uncertainty estimates. The estimated standard deviation effectively quantifies the uncertainty due to noisy data, increasing with noise levels and being larger in regions with sparse data. The errors are typically bounded by two standard deviations.
B-PINN-VI: Often fails to provide accurate uncertainty quantification, particularly underestimating uncertainty in sparse data regions or providing large uncertainties at boundaries despite data being available. The accuracy of mean predictions is also often inferior to B-PINN-HMC.
Dropout: Provides uniform-looking uncertainties that do not reflect the data distribution or noise level. The predictive means are also less accurate compared to B-PINN-HMC, and errors are often not covered by the estimated uncertainty. Dropout uncertainty estimates were found to be sensitive to architectural choices and dropout rates, without clear guidance for optimal settings.
Comparison with Standard PINNs: For problems with high noise, standard PINNs show significant overfitting artifacts, leading to inaccurate solutions and parameter estimates. B-PINN-HMC is more robust to noise and provides better accuracy in such scenarios, in addition to quantifying uncertainty.
The paper also explores replacing the BNN surrogate model with a truncated Karhunen-Loève (KL) expansion for 1D problems. With fewer parameters, KL-based models can be faster than BNNs for low-dimensional problems. Using KL expansion with HMC (KL-HMC) or Deep Normalizing Flow (KL-DNF) as posterior estimators also yielded accurate results comparable to B-PINN-HMC for the tested 1D cases. However, KL expansion suffers from the "curse of dimensionality," making it less practical for high-dimensional problems where BNNs are more suitable. DNF, while providing independent samples easily after training, was found to be computationally more expensive for training than HMC in the tested scenarios.
Practical Implementation Considerations:
Noise Model: The framework assumes Gaussian noise with known standard deviations. In real-world applications, noise characteristics might need to be estimated or modeled differently.
Likelihood Construction: Accurate evaluation of PDE residuals and boundary terms requires a smooth and differentiable surrogate model, which neural networks with appropriate activation functions and automatic differentiation provide.
Choice of Posterior Estimator:
HMC: Provides higher accuracy for posterior approximation but is computationally demanding and might require careful tuning (step size, number of steps, burn-in). It can be more complex to implement than VI.
VI: Faster and potentially easier to implement but relies on the approximation power of the chosen variational distribution (e.g., factorized Gaussian), which might not capture the true posterior structure well.
DNF: Can provide independent samples efficiently after training, but training can be very computationally expensive.
Surrogate Model Choice: BNNs are flexible for high-dimensional problems but are heavily overparameterized. KL expansion is efficient for low dimensions but struggles with high-dimensional inputs. The choice depends on the problem dimensionality and available computational resources.
Hyperparameter Tuning: Performance depends significantly on neural network architecture (width, depth), prior distribution parameters (variance of weights/biases), and posterior estimation algorithm parameters (HMC steps/timestep, VI/DNF training parameters).
Computational Cost: B-PINNs, especially with HMC, are more computationally expensive than standard PINNs due to the need for sampling or complex optimization over a potentially large parameter space. Scaling to large datasets might require stochastic variants of MCMC or VI.
In summary, B-PINNs offer a robust method for solving PDEs and inverse problems with noisy data, providing both accurate predictions and reliable uncertainty quantification. B-PINN-HMC emerged as the most effective approach among those compared, particularly demonstrating superiority over standard PINNs and dropout in handling significant data noise. While computationally more demanding, the benefits of uncertainty quantification and robustness to noise make B-PINNs a valuable tool for applications where data quality is imperfect.