Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 171 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 43 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Solving parametric PDE problems with artificial neural networks (1707.03351v3)

Published 11 Jul 2017 in math.NA

Abstract: The curse of dimensionality is commonly encountered in numerical partial differential equations (PDE), especially when uncertainties have to be modeled into the equations as random coefficients. However, very often the variability of physical quantities derived from a PDE can be captured by a few features on the space of the coefficient fields. Based on such an observation, we propose using a neural-network (NN) based method to parameterize the physical quantity of interest as a function of input coefficients. The representability of such quantity using a neural-network can be justified by viewing the neural-network as performing time evolution to find the solutions to the PDE. We further demonstrate the simplicity and accuracy of the approach through notable examples of PDEs in engineering and physics.

Citations (341)

Summary

  • The paper presents a neural network surrogate model that approximates complex mappings in parametric PDEs, effectively mitigating the curse of dimensionality.
  • It details a CNN-based architecture with sum-pooling that exploits low-dimensional structures in high-dimensional coefficient fields for efficient PDE solutions.
  • Numerical experiments on elliptic and nonlinear Schrödinger problems demonstrate high precision, achieving relative test errors as low as 1.5e-4.

Artificial Neural Networks (ANNs) offer a data-driven approach to mitigate the curse of dimensionality often encountered when solving Partial Differential Equations (PDEs) with uncertain or spatially varying parameters. The core idea, elaborated in "Solving parametric PDE problems with artificial neural networks" (1707.03351), is to train an ANN as a surrogate model that directly maps the high-dimensional parametric input field to the low-dimensional physical quantity of interest derived from the PDE solution. This avoids the need for computationally expensive PDE solves for each new parameter instance during inference or analysis.

Neural Network Surrogate Modeling Approach

The fundamental challenge addressed is the efficient computation of a scalar quantity f(a)f(a), such as effective conductivity or ground state energy, which depends on a coefficient field a(x)a(x) within a PDE. Discretizing a(x)a(x) on a grid results in a high-dimensional parameter vector aRNa \in \mathbb{R}^{N}, where NN can be large (e.g., ndn^d for a dd-dimensional grid of size nn). Traditional methods like Monte Carlo or stochastic Galerkin methods often struggle with this high dimensionality.

The proposed ANN-based method leverages the observation that f(a)f(a) often exhibits a lower-dimensional structure, meaning it primarily depends on a few key features of the high-dimensional input aa. The ANN, denoted as hθ(a)h_\theta(a) with parameters θ\theta, is trained to approximate this complex, potentially nonlinear mapping f(a)hθ(a)f(a) \approx h_\theta(a). The process involves:

  1. Data Generation: A dataset of input-output pairs {(ak,f(ak))}k=1K\{(a^k, f(a^k))\}_{k=1}^K is generated. This requires sampling KK instances of the parameter field aka^k from its underlying distribution (e.g., a random field model). For each sample aka^k, the corresponding deterministic PDE is solved numerically to compute the target quantity f(ak)f(a^k). This step can be computationally intensive, as it involves repeated PDE solves.
  2. Network Training: The ANN hθ(a)h_\theta(a) is trained using the generated dataset. The parameters θ\theta (weights and biases) are optimized by minimizing a suitable loss function, typically the mean squared error (MSE) between the network predictions and the true values: L(θ)=1Kk=1Khθ(ak)f(ak)2L(\theta) = \frac{1}{K} \sum_{k=1}^K || h_\theta(a^k) - f(a^k) ||^2. Standard optimization algorithms like stochastic gradient descent (SGD) or its variants (e.g., Adam) are employed.
  3. Validation and Application: The trained network hθ(a)h_\theta(a) is validated on a separate test dataset. Once validated, it serves as a fast surrogate model. Evaluating hθ(a)h_\theta(a) for a new input aa is significantly cheaper than solving the original PDE. This enables rapid statistical analysis (e.g., computing moments of f(a)f(a)), uncertainty quantification, and optimization tasks involving f(a)f(a), where gradients af(a)\nabla_a f(a) can be efficiently computed via backpropagation through the network.

This approach effectively circumvents the curse of dimensionality by letting the network learn the relevant low-dimensional manifold on which the quantity of interest resides, rather than relying on predefined basis expansions in the high-dimensional parameter space.

Network Architecture and Theoretical Underpinnings

The choice of network architecture is crucial. The paper (1707.03351) provides both theoretical motivation and practical implementations, particularly favoring Convolutional Neural Networks (CNNs) due to the spatial nature of PDE coefficients.

Theoretical Justification: A key insight connects the iterative solution process of certain PDEs to the structure of deep neural networks. Considering the gradient descent method for solving the variational problem associated with an elliptic PDE (Eq. 18 in the paper):

u(m+1)=u(m)ΔtGradient(E(u(m);a))u^{(m+1)} = u^{(m)} - \Delta t \cdot \text{Gradient}(E(u^{(m)}; a))

Each iteration mm can be viewed as a layer in a deep network. The update involves operations combining the current solution estimate u(m)u^{(m)} and the coefficient field aa. Specifically, terms like Lau(m)L_a u^{(m)} (where LaL_a is the PDE operator involving aa) represent local interactions, analogous to convolutions. The dependence on aa throughout the iterative process is mirrored by incorporating aa into multiple layers, resembling residual connections (ResNets).

Theorem 1 formally establishes the representability of the effective conductance Aeff(a)A_{eff}(a) by an NN. It states that for the elliptic problem, an NN exists with depth polynomial in log(n)\log(n) and log(1/ϵ)\log(1/\epsilon) (where nn is the grid size and ϵ\epsilon is the approximation error) and width polynomial in ndn^d (nodes per layer) that can approximate Aeff(a)A_{eff}(a) up to error ϵ\epsilon. Importantly, this demonstrates that the network complexity does not necessarily scale exponentially with the input dimension N=ndN=n^d, unlike shallow networks or traditional basis expansions in high dimensions.

Practical CNN Architecture: For implementation, a relatively simple CNN architecture was employed (Fig. 2 in the paper):

  • Input: The discretized coefficient field aa (e.g., an n×nn \times n matrix for 2D).
  • Convolutional Layers: Multiple layers with small filters (e.g., 3x3) apply convolutions to extract spatial features from aa. Periodic padding is used to handle periodic boundary conditions assumed in the examples. ReLU activation introduces nonlinearity.
  • Sum-Pooling: After the convolutional layers, a sum-pooling operation is applied across all spatial dimensions of the resulting feature maps. This step is motivated by the desire for translation invariance: if the input field a(x)a(x) is shifted, the physical quantity f(a)f(a) (like effective conductivity) should remain unchanged for periodic problems. Sum-pooling followed by a linear layer enforces this property (Eq. 15).
  • Final Linear Layer: A fully connected layer maps the pooled features to the final scalar output f(a)f(a).

For a 1D effective conductance problem where an analytical solution (harmonic mean) exists, a specialized deep network architecture mirroring this analytical structure was also designed (Fig. 3), achieving higher accuracy than the general CNN.

Implementation Examples and Results

The effectiveness of the ANN surrogate modeling approach was demonstrated on two canonical parametric PDE problems:

1. Effective Conductance (Elliptic PDE):

  • Problem: Solving (a(x)(u(x)+ξ))=0- \nabla \cdot (a(x) (\nabla u(x) + \xi)) = 0 with periodic boundary conditions, where a(x)a(x) is the spatially varying conductivity field. The goal is to compute the effective conductance Aeff(a)A_{eff}(a), defined via a variational principle (Eq. 9).
  • Input: aa is discretized on an n×nn \times n grid, with values aija_{ij} sampled i.i.d. from U[0.3,3]U[0.3, 3].
  • Data Generation: For each sampled field aka^k, the corresponding linear system (discretized PDE, Eq. 8) is solved to find uku^k, from which Aeff(ak)A_{eff}(a^k) is computed.
  • Results: Using the described CNN architecture, relative test errors of approximately 23×1032-3 \times 10^{-3} were achieved for 2D grids (n=8,16n=8, 16) (Table 1). The specialized 1D network achieved a higher relative accuracy of 5×104\approx 5 \times 10^{-4}.

2. Ground State Energy (Nonlinear Schrödinger Equation - NLSE):

  • Problem: Finding the ground state energy E0(a)E_0(a) of the NLSE: Δu(x)+a(x)u(x)+σu(x)3=E0u(x)- \Delta u(x) + a(x) u(x) + \sigma u(x)^3 = E_0 u(x), subject to normalization u2=1\int |u|^2 = 1 and periodic boundary conditions. Here, a(x)a(x) is an inhomogeneous potential field.
  • Input: aa is discretized on an n×nn \times n grid, with values aija_{ij} sampled i.i.d. from U[1,16]U[1, 16].
  • Data Generation: Solving this nonlinear eigenvalue problem (Eq. 12) is more demanding. A homotopy continuation method combined with Newton's iterations was used to find (uk,E0(ak))(u^k, E_0(a^k)) for each sample aka^k.
  • Results: The same CNN architecture achieved high accuracy, with relative test errors around 1.55×1041.5 - 5 \times 10^{-4} for 2D grids (n=8,16n=8, 16) (Table 2). This demonstrates the method's applicability even when the underlying PDE solver is complex and computationally intensive.

These examples illustrate that ANNs, particularly CNNs informed by the structure of the PDE problem, can learn accurate surrogate models for quantities of interest derived from parametric PDEs. The computational gain arises from replacing numerous expensive PDE solves with rapid evaluations of the trained network.

Conclusion

In summary, the use of artificial neural networks provides a potent framework for addressing parametric PDE problems plagued by the curse of dimensionality. By training ANNs on data generated from numerical PDE solutions, highly accurate surrogate models can be constructed that map high-dimensional parameter fields to low-dimensional physical outputs. Theoretical arguments link network architectures to iterative PDE solvers, while numerical experiments on elliptic and nonlinear Schrödinger equations confirm the practical viability and accuracy of the approach, offering a computationally efficient alternative for uncertainty quantification and analysis in complex physical systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.