Solving parametric PDE problems with artificial neural networks (1707.03351v3)

Published 11 Jul 2017 in math.NA

Abstract: The curse of dimensionality is commonly encountered in numerical partial differential equations (PDE), especially when uncertainties have to be modeled into the equations as random coefficients. However, very often the variability of physical quantities derived from a PDE can be captured by a few features on the space of the coefficient fields. Based on such an observation, we propose using a neural-network (NN) based method to parameterize the physical quantity of interest as a function of input coefficients. The representability of such quantity using a neural-network can be justified by viewing the neural-network as performing time evolution to find the solutions to the PDE. We further demonstrate the simplicity and accuracy of the approach through notable examples of PDEs in engineering and physics.

Citations (341)

View on Semantic Scholar

Summary

The paper presents a neural network surrogate model that approximates complex mappings in parametric PDEs, effectively mitigating the curse of dimensionality.
It details a CNN-based architecture with sum-pooling that exploits low-dimensional structures in high-dimensional coefficient fields for efficient PDE solutions.
Numerical experiments on elliptic and nonlinear Schrödinger problems demonstrate high precision, achieving relative test errors as low as 1.5e-4.

Artificial Neural Networks (ANNs) offer a data-driven approach to mitigate the curse of dimensionality often encountered when solving Partial Differential Equations (PDEs) with uncertain or spatially varying parameters. The core idea, elaborated in "Solving parametric PDE problems with artificial neural networks" (1707.03351), is to train an ANN as a surrogate model that directly maps the high-dimensional parametric input field to the low-dimensional physical quantity of interest derived from the PDE solution. This avoids the need for computationally expensive PDE solves for each new parameter instance during inference or analysis.

Neural Network Surrogate Modeling Approach

The fundamental challenge addressed is the efficient computation of a scalar quantity $f(a)$ , such as effective conductivity or ground state energy, which depends on a coefficient field $a(x)$ within a PDE. Discretizing $a(x)$ on a grid results in a high-dimensional parameter vector $a \in \mathbb{R}^{N}$ , where $N$ can be large (e.g., $n^d$ for a $d$ -dimensional grid of size $n$ ). Traditional methods like Monte Carlo or stochastic Galerkin methods often struggle with this high dimensionality.

The proposed ANN-based method leverages the observation that $f(a)$ often exhibits a lower-dimensional structure, meaning it primarily depends on a few key features of the high-dimensional input $a$ . The ANN, denoted as $h_\theta(a)$ with parameters $\theta$ , is trained to approximate this complex, potentially nonlinear mapping $f(a) \approx h_\theta(a)$ . The process involves:

Data Generation: A dataset of input-output pairs $\{(a^k, f(a^k))\}_{k=1}^K$ is generated. This requires sampling $K$ instances of the parameter field $a^k$ from its underlying distribution (e.g., a random field model). For each sample $a^k$ , the corresponding deterministic PDE is solved numerically to compute the target quantity $f(a^k)$ . This step can be computationally intensive, as it involves repeated PDE solves.
Network Training: The ANN $h_\theta(a)$ is trained using the generated dataset. The parameters $\theta$ (weights and biases) are optimized by minimizing a suitable loss function, typically the mean squared error (MSE) between the network predictions and the true values: $L(\theta) = \frac{1}{K} \sum_{k=1}^K || h_\theta(a^k) - f(a^k) ||^2$ . Standard optimization algorithms like stochastic gradient descent (SGD) or its variants (e.g., Adam) are employed.
Validation and Application: The trained network $h_\theta(a)$ is validated on a separate test dataset. Once validated, it serves as a fast surrogate model. Evaluating $h_\theta(a)$ for a new input $a$ is significantly cheaper than solving the original PDE. This enables rapid statistical analysis (e.g., computing moments of $f(a)$ ), uncertainty quantification, and optimization tasks involving $f(a)$ , where gradients $\nabla_a f(a)$ can be efficiently computed via backpropagation through the network.

This approach effectively circumvents the curse of dimensionality by letting the network learn the relevant low-dimensional manifold on which the quantity of interest resides, rather than relying on predefined basis expansions in the high-dimensional parameter space.

Network Architecture and Theoretical Underpinnings

The choice of network architecture is crucial. The paper (1707.03351) provides both theoretical motivation and practical implementations, particularly favoring Convolutional Neural Networks (CNNs) due to the spatial nature of PDE coefficients.

Theoretical Justification: A key insight connects the iterative solution process of certain PDEs to the structure of deep neural networks. Considering the gradient descent method for solving the variational problem associated with an elliptic PDE (Eq. 18 in the paper):

$u^{(m+1)} = u^{(m)} - \Delta t \cdot \text{Gradient}(E(u^{(m)}; a))$

Each iteration $m$ can be viewed as a layer in a deep network. The update involves operations combining the current solution estimate $u^{(m)}$ and the coefficient field $a$ . Specifically, terms like $L_a u^{(m)}$ (where $L_a$ is the PDE operator involving $a$ ) represent local interactions, analogous to convolutions. The dependence on $a$ throughout the iterative process is mirrored by incorporating $a$ into multiple layers, resembling residual connections (ResNets).

Theorem 1 formally establishes the representability of the effective conductance $A_{eff}(a)$ by an NN. It states that for the elliptic problem, an NN exists with depth polynomial in $\log(n)$ and $\log(1/\epsilon)$ (where $n$ is the grid size and $\epsilon$ is the approximation error) and width polynomial in $n^d$ (nodes per layer) that can approximate $A_{eff}(a)$ up to error $\epsilon$ . Importantly, this demonstrates that the network complexity does not necessarily scale exponentially with the input dimension $N=n^d$ , unlike shallow networks or traditional basis expansions in high dimensions.

Practical CNN Architecture: For implementation, a relatively simple CNN architecture was employed (Fig. 2 in the paper):

Input: The discretized coefficient field $a$ (e.g., an $n \times n$ matrix for 2D).
Convolutional Layers: Multiple layers with small filters (e.g., 3x3) apply convolutions to extract spatial features from $a$ . Periodic padding is used to handle periodic boundary conditions assumed in the examples. ReLU activation introduces nonlinearity.
Sum-Pooling: After the convolutional layers, a sum-pooling operation is applied across all spatial dimensions of the resulting feature maps. This step is motivated by the desire for translation invariance: if the input field $a(x)$ is shifted, the physical quantity $f(a)$ (like effective conductivity) should remain unchanged for periodic problems. Sum-pooling followed by a linear layer enforces this property (Eq. 15).
Final Linear Layer: A fully connected layer maps the pooled features to the final scalar output $f(a)$ .

For a 1D effective conductance problem where an analytical solution (harmonic mean) exists, a specialized deep network architecture mirroring this analytical structure was also designed (Fig. 3), achieving higher accuracy than the general CNN.

Implementation Examples and Results

The effectiveness of the ANN surrogate modeling approach was demonstrated on two canonical parametric PDE problems:

1. Effective Conductance (Elliptic PDE):

Problem: Solving $- \nabla \cdot (a(x) (\nabla u(x) + \xi)) = 0$ with periodic boundary conditions, where $a(x)$ is the spatially varying conductivity field. The goal is to compute the effective conductance $A_{eff}(a)$ , defined via a variational principle (Eq. 9).
Input: $a$ is discretized on an $n \times n$ grid, with values $a_{ij}$ sampled i.i.d. from $U[0.3, 3]$ .
Data Generation: For each sampled field $a^k$ , the corresponding linear system (discretized PDE, Eq. 8) is solved to find $u^k$ , from which $A_{eff}(a^k)$ is computed.
Results: Using the described CNN architecture, relative test errors of approximately $2-3 \times 10^{-3}$ were achieved for 2D grids ( $n=8, 16$ ) (Table 1). The specialized 1D network achieved a higher relative accuracy of $\approx 5 \times 10^{-4}$ .

2. Ground State Energy (Nonlinear Schrödinger Equation - NLSE):

Problem: Finding the ground state energy $E_0(a)$ of the NLSE: $- \Delta u(x) + a(x) u(x) + \sigma u(x)^3 = E_0 u(x)$ , subject to normalization $\int |u|^2 = 1$ and periodic boundary conditions. Here, $a(x)$ is an inhomogeneous potential field.
Input: $a$ is discretized on an $n \times n$ grid, with values $a_{ij}$ sampled i.i.d. from $U[1, 16]$ .
Data Generation: Solving this nonlinear eigenvalue problem (Eq. 12) is more demanding. A homotopy continuation method combined with Newton's iterations was used to find $(u^k, E_0(a^k))$ for each sample $a^k$ .
Results: The same CNN architecture achieved high accuracy, with relative test errors around $1.5 - 5 \times 10^{-4}$ for 2D grids ( $n=8, 16$ ) (Table 2). This demonstrates the method's applicability even when the underlying PDE solver is complex and computationally intensive.

These examples illustrate that ANNs, particularly CNNs informed by the structure of the PDE problem, can learn accurate surrogate models for quantities of interest derived from parametric PDEs. The computational gain arises from replacing numerous expensive PDE solves with rapid evaluations of the trained network.

Conclusion

In summary, the use of artificial neural networks provides a potent framework for addressing parametric PDE problems plagued by the curse of dimensionality. By training ANNs on data generated from numerical PDE solutions, highly accurate surrogate models can be constructed that map high-dimensional parameter fields to low-dimensional physical outputs. Theoretical arguments link network architectures to iterative PDE solvers, while numerical experiments on elliptic and nonlinear Schrödinger equations confirm the practical viability and accuracy of the approach, offering a computationally efficient alternative for uncertainty quantification and analysis in complex physical systems.