Physics-Informed Neural Networks

Updated 14 November 2025

Physics-Informed Neural Networks are techniques that embed governing physical laws into a neural network’s loss function to solve differential equations.
They use a composite loss function combining data misfit, PDE residual, and boundary condition terms to ensure both data accuracy and physics consistency.
Optimization strategies, including curriculum training, adaptive sampling, and smooth activations, help mitigate nonconvexity and improve convergence.

Physics-Informed Neural Networks (PINNs) are a family of computational methods that leverage the function-approximation capabilities of artificial neural networks to solve partial differential equations (PDEs), ordinary differential equations (ODEs), and inverse problems, by explicitly encoding governing physical laws into the network’s loss function via residual constraints. This paradigm enables mesh-free, continuous solution representations that satisfy both observed data and the underlying physics, generalizing across complex high-dimensional domains, and overcoming some limitations of purely analytic or traditional numerical solvers.

1. Mathematical Foundation and Loss Construction

A canonical PINN employs a feedforward neural network Λ(x; θ) parameterized by weights θ, taking as inputs the problem's independent variables (e.g., spatial coordinates $x\in\mathbb{R}^n$ , time $t$ , and possibly other parameters like a wave source location) and mapping to the solution or solutions of interest. The output may be scalar (for a single PDE state) or vector-valued (e.g., real and imaginary parts for wavefields).

The total loss functional is comprised of multiple terms: $\mathcal{L}_{\rm total}(\theta) = \mathcal{L}_{\rm data} + \lambda_r\,\mathcal{L}_{\rm PDE} + \lambda_{\rm bc}\,\mathcal{L}_{\rm bc}$

Data misfit loss:

$\mathcal{L}_{\rm data} = \frac{1}{N_d}\sum_{i=1}^{N_d} \|u(x_i) - \Lambda(x_i;\theta)\|^2$

Enforces fit to known data points.
Physics (PDE) residual loss:

$\mathcal{L}_{\rm PDE} = \frac{1}{N_r}\sum_{j=1}^{N_r} \|\mathcal{N}[\Lambda](x_j)\|^2$

Enforces the PDE operator residual at collocation points $x_j$ .
Boundary and/or initial condition loss:

$\mathcal{L}_{\rm bc} = \frac{1}{N_{\rm bc}}\sum_{k=1}^{N_{\rm bc}} \|\Lambda(x_k^{\rm bc};\theta) - g(x_k^{\rm bc})\|^2$

Imposes BCs/ICs as soft constraints at sampled locations.

The weights $\lambda_r$ and $\lambda_{\rm bc}$ balance each loss term and are critical to convergence, as the physics term can dominate the loss landscape in stiff PDEs if unregulated.

Automatic differentiation is used to compute all required derivatives with respect to the network inputs, enabling exact PDE residual calculation and mesh-free training.

2. Optimization Strategies and Convergence Control

Training PINNs is a high-dimensional, nonconvex optimization problem. Standard optimizers include Adam, which uses first-order momentum (common β₁=0.9, β₂=0.999), often with an initial learning rate $10^{-3}$ to $10^{-4}$ , and L-BFGS for quasi-Newton fine-tuning using curvature information from a limited number of previous steps.

To address optimization pathologies:

Curriculum-style training: Start by minimizing only the data/boundary loss, then introduce the PDE residual gradually to avoid early domination by stiff physics constraints.
Loss-term weighting: Ramp up $\lambda_r$ during training.
Boundary/initial condition encoding: Design the network output as $U(x,t) = B(x,t)\Lambda(x,t) + u_0(x,t)$ so that BCs/ICs are satisfied by construction, reducing the search space for $\Lambda$ .
Adaptive sampling: Dynamically focus collocation points in subdomains with high PDE residual or large network gradients, guided by functionals $|\nabla \Lambda|$ , as formalized in Theorem 2 (gradient-based sampling). For data-driven regions, expand the collocation domain adaptively.
Activation function selection: Use smooth activations (e.g., tanh or SoftPlus) to mitigate vanishing gradients, particularly for higher-order PDEs or forward wave equations.

3. Illustrative Use Cases and Quantitative Performance

PINNs have been successfully applied to a variety of canonical and complex problems:

Simple ODEs/PDEs: For problems like $u''(x) = f(x)$ , PINNs yield smooth derivative estimates surpassing piecewise-finite-difference approximations.
Wave equations: For the 1D wave equation $u_{tt} = u_{xx}$ with Dirichlet and initial conditions, a PINN (hidden-layer sizes [64,32,16,8], tanh) with $10^3$ IC points, $10^3$ BC points, and $10^4$ collocation points achieves relative $L_2$ errors of $10^{-2}$ – $10^{-3}$ for modest networks and displays FEM-like convergence $e_{\rm conv} \sim N_{\rm iter}^{-1/2}$ .
Forward seismic modeling: For the 2D acoustic wave equation, deep PINNs (10 layers × 1024 channels, SoftPlus) with initial data from FD simulations and physics-based loss across the domain reproduce FD snapshots and generalize to long time intervals with order-of-magnitude speedup.
Inverse problems: In PINN-based waveform inversion, one network reconstructs the scattered field while another estimates model parameters (e.g., squared slowness), with TV regularization. PINN-initialized inversion demonstrates smoother and more accurate reconstructions compared to classical starts.

Performance metrics commonly reported include:

Energy-norm error $\int\!\int |\nabla u - \nabla \Lambda|^2$ .
MSE over dense grids.
Relative $L_2$ norm error.

Architectural trade-offs exhibit that larger, deeper networks improve accuracy but with increased computational cost (e.g., small networks converge in minutes, large networks may require hours).

4. Error Analysis and Adaptive Methods

Error propagation patterns uncovered via careful loss/solution analysis indicate:

Solution errors grow away from regions with dense data or initial conditions, propagating along physical characteristics (e.g., as a wavefront).
Error localization is pronounced near discontinuities, high-curvature, or shocks.
Derivative errors co-locate with solution error peaks, a key consideration for high-frequency applications.

To address these phenomena:

Data-centric curriculum learning (Theorem 1): Collocation density is initially concentrated near regions of high data availability, then extended.
Gradient-based adaptive sampling (Theorem 2): Points are more densely sampled where $|\nabla \Lambda|$ is large, focusing computational effort where the residual is most significant.

Boundary and initial conditions can be encoded via output transformation $U(x,t)=B(x,t)\Lambda(x,t)+u_0(x,t)$ , eliminating the need to penalize BC residuals explicitly and improving convergence rates.

5. Limitations, Challenges, and Prospective Directions

Despite their versatility, PINNs exhibit several limitations:

Accuracy loss near discontinuities/high-gradients: Enforcing higher-order derivatives via AD or finite differences becomes unstable, and PINNs can under-resolve shocks unless specifically regularized.
Architectural heuristics: There is no established prescription for optimal network width, depth, or activation choices for a given PDE problem.
Local minima and nonconvexity: The combined data-and-physics loss surface is highly nonconvex, leading to local minima (cycle skipping), particularly in ill-posed inverse problems.
Computational cost: Large models and dense collocation sampling impose high CPU/GPU resource requirements.

Open directions include:

Rigorous convergence and error-bound theory for PINN approximation on arbitrary PDEs.
Adaptive mesh refinement and multi-scale PINNs to improve solution fidelity in heterogeneous domains.
Hybridization with conventional solvers (e.g., finite element methods) for subdomain coupling, enabling mixture-of-experts architectures for complex multiphysics problems.
Automated architecture selection, domain decomposition, and mixture-of-experts strategies for improved performance in 3D, multiphysics, and tightly-coupled regimes.

6. Domains of Strength and Practical Applications

PINNs are demonstrably effective in scenarios where classical mesh-based methods are infeasible or ill-posed:

Data-limited or ill-posed problems: Physics constraints compensate for sparse/noisy measurements.
High-dimensional and irregular domains: The mesh-free formulation is indifferent to dimensionality and geometry, especially advantageous in unstructured domains.
Inverse problems: Model parameters (e.g., coefficients in the governing equations) are learned directly as network parameters, with no need for separate inversion algorithms.
Fast surrogate evaluation: Once trained, PINN surrogates can predict the solution at arbitrary points within (and sometimes beyond) the training domain at negligible additional computational cost.

The continuous, differentiable nature of the PINN solution makes it a compelling computational tool for simulations requiring gradient information (e.g., sensitivity analysis), and for applications in physics-informed machine learning pipelines, parameter inference, and uncertainty quantification.

Physics-Informed Neural Networks thus constitute a flexible, theoretically grounded, and practically impactful approach for embedding physical constraints into machine learning workflows. Their blend of universal approximation, automatic differentiation, and loss-engineering flexibility underpins their success in problems ranging from low-dimensional toy PDEs to high-dimensional, data-scarce, or multiphysics regimes, while simultaneously motivating a rich set of future research issues in theory, optimization, and applied scientific computing (Small, 2023).

PDF Markdown Chat (Pro)

References (1)

An Analysis of Physics-Informed Neural Networks (2023)

Follow Topic

Get notified by email when new papers are published related to Physics-Informed Neural Network Approach.