PDE-Based Optimization Method

Updated 18 September 2025

PDE-based optimization methods are techniques that solve optimization problems subject to PDE constraints, ensuring that solutions satisfy governing physical laws.
They employ algorithmic strategies such as all-at-once, reduced, and penalty methods to balance computational efficiency with enforcing state constraints.
Adjoint-based differentiation and neural surrogate models enhance gradient computation and scalability, making these methods practical for inverse problems, control, and mesh optimization.

A PDE-based optimization method is a class of techniques in which an optimization problem is posed over functions that are subject to constraints given by partial differential equations (PDEs). These methods are fundamental in inverse problems, control, parameter estimation, and design tasks in applied mathematics, engineering, and computational sciences. The framework imposes a PDE (e.g., elliptic, parabolic, hyperbolic) as a constraint, requiring that candidate solutions satisfy the governing physics at all times and locations.

1. Mathematical Formulation and Governing Principles

A general PDE-based optimization problem takes the form: $\begin{aligned} & \min_{u,\,m} \quad J(u, m) \ & \text{subject to} \quad \mathcal{A}(m) u = f, \quad \mathcal{B}(u, m) = 0 \end{aligned}$ where $u$ is the state (solution to the PDE), $m$ is a control or parameter (such as coefficients, source terms, or design variables), $J$ is the objective functional encoding tracking, control, or regularization criteria, and $\mathcal{A}$ and $\mathcal{B}$ encode the underlying PDE and boundary/initial conditions.

Two critical aspects distinguish PDE-based optimization:

State constraints: The solution $u$ must exactly (or approximately) satisfy the PDE constraint.
Infinite-/high-dimensional controls: The controls $m$ are often functions, leading to large-scale or infinite-dimensional optimization.

Theoretical guarantees, such as existence of a solution and necessary optimality conditions, link the regularity of the PDE operator, the control-to-state map, and the structure of the objective.

2. Core Algorithmic Strategies

Three dominant algorithmic approaches are used for solving PDE-constrained optimization problems:

a) All-at-once Approach

This approach treats state variables, control/parameter variables, and Lagrange multipliers (dual variables) as unknowns in a coupled KKT (Karush-Kuhn-Tucker) system. One seeks points where the Lagrangian is stationary with respect to all variables. The resulting large nonlinear system is: $\begin{pmatrix} \nabla_{u}^{2} \mathcal{L} & \nabla_{u}\nabla_{m} \mathcal{L} & \nabla_{u}\mathcal{A}^* \ \nabla_{m} \nabla_{u} \mathcal{L} & \nabla_{m}^{2} \mathcal{L} & \nabla_{m}\mathcal{A}^* \ \mathcal{A} & 0 & 0 \end{pmatrix} \begin{pmatrix} \delta u \ \delta m \ \delta \lambda \end{pmatrix} = - \begin{pmatrix} \text{gradients} \ \text{residuals} \end{pmatrix}$ This approach enforces the PDE and optimization constraints tightly but is often impractical for large-scale problems due to storage and computational requirements; it is mainly used where the problem size is moderate or where high-accuracy solutions are needed for all variables simultaneously (Leeuwen et al., 2015).

b) Reduced Approach

The state variable $u$ is eliminated by using a parameter-to-state map: for each $m$ , $u(m)$ solves $\mathcal{A}(m)u = f$ . The resulting optimization is over $m$ only: $\min_m \; J(u(m), m)$ The reduced gradient is computed via the adjoint method, and the key computational step is solving the PDE for each $m$ . This approach is widely used due to its favorable memory profile but can introduce significant nonlinearity into the reduced objective, especially when the dependence $u(m)$ is nonlinear or ill-conditioned.

c) Penalty and Augmented Lagrangian Methods

Penalty methods incorporate the PDE constraint into the objective via a quadratic penalty parameter: $\min_{u, m} \frac{1}{2}\|P u - d\|^2 + \frac{\lambda}{2} \|\mathcal{A}(m) u - f\|^2$ The constraint is enforced approximately; as $\lambda \to \infty$ , feasible solutions are recovered (Leeuwen et al., 2015). The balance between enforcing constraints and smoothing the optimization landscape can alleviate problems like nonlinearity and sensitivity to initial guesses.

3. Adjoint-Based Differentiation and Sensitivity Analysis

Adjoint methods are instrumental for efficient gradients and Hessians in PDE-based optimization:

For the reduced approach: The adjoint PDE is solved backward (with an operator-adjoint right-hand side) to compute the gradient $\nabla_m J$ efficiently, regardless of the dimensionality of $m$ .
For the all-at-once approach: Automatic or analytic differentiation of the coupled system provides derivative information.

A typical first-order optimality condition (for $\min_m\, J(u(m), m)$ ) takes the variational form: $\left.\frac{d}{d\epsilon}\right|_{\epsilon=0} J(u(m+\epsilon \delta m),\, m+\epsilon \delta m) = 0$ where the derivative passes through the PDE's solution operator and is computed with adjoint solves.

The adjoint approach is also essential in complex optimization flows, such as online-adjoint algorithms (Sirignano et al., 2021), neural-operator-based PDE surrogates (Cheng et al., 16 Jun 2025), and bi-level frameworks (Hao et al., 2022).

4. Algorithms for Nonsmooth and Large-Scale Problems

For PDE-constrained problems involving nonsmooth (e.g., $\ell_1$ - or TV-) regularization, control/state constraints, or sparsity, more advanced algorithms are required. Notable approaches include:

Interior Point Methods (IPMs): Nonsmoothness is regularized or smoothed (e.g., via control variable splitting and log barriers), leading to large indefinite saddle-point systems. These are handled with robust preconditioners exploiting PDE structure and parameter scalability (Pearson et al., 2018).
First-Order Primal-Dual Splitting: Efficient for nonsmooth and large-scale problems, particularly when combined with an interwoven PDE constraint solver in which each iteration updates the control and only partially solves the PDE/adjoint with inexpensive iterative methods (e.g., Jacobi, Gauss-Seidel). Proven linear convergence under a second-order growth condition is achieved (Jensen et al., 2022).

Scalability and parallelism have been advanced by exploiting Krylov, multigrid, and block-preconditioning strategies, and by partitioning time for parabolic PDEs via PFASST (Götschel et al., 2019).

5. Specialized Methods and Physical Interpretations

Some frameworks are closely tied to the physical/geometric nature of the PDE:

Motion PDEs for Sparsity: For inverse problems such as sparse deconvolution, nonlinear evolution PDEs (e.g., continuity equations reminiscent of optimal transport or fluid flow) operate as plug-in steps in iterative solvers. These PDEs rearrange spike locations to better satisfy data constraints while preserving sparsity ( $\ell_1$ -norm) and accelerating convergence by lowering $L_2$ data misfit (Mao et al., 2011).
Shape and Mesh Optimization: Methods leveraging deformation diffeomorphisms or target-matrix optimization paradigms reformulate the optimization over meshes or domains. High-order moving mesh methods are designed to be compatible with isoparametric finite elements, supporting arbitrary smoothness and order (Paganini et al., 2017, Kolev et al., 2 Jul 2025).
Stochastic and Distributional Optimization: When parameters are inherently random, the PDE solver is viewed as a push-forward on probability measures. Optimization is carried out over distributions (e.g., via Wasserstein gradient flows), extending deterministic inverse problem techniques to quantify uncertainty (Li et al., 2023).

6. Neural and Operator Learning Enhancements

Machine learning-based surrogates—especially neural operators—are increasingly integrated into PDE-based optimization to alleviate the computational demands of repeated PDE solves:

Physics-Informed DeepONets: Surrogate operator models are trained self-supervised to satisfy PDE constraints, enabling rapid optimization with differentiable surrogates, even in high-/infinite-dimensional control spaces (Wang et al., 2021).
Optimization-Oriented Neural Operator Training: Approaches that directly train neural operators to match not just the PDE solution but also its derivatives along optimization trajectories, using specialized layers (e.g., Virtual-Fourier) to improve gradient accuracy and ensure robust optimization convergence (Cheng et al., 16 Jun 2025).
Bi-Level Learning Frameworks: PINNs are deployed in the inner loop to solve the PDE; outer-loop optimization adjusts control variables using Broyden's method and implicit differentiation to approximate hypergradients, bypassing computational bottlenecks in highly nonlinear settings (Hao et al., 2022).

7. Applications and Performance Benchmarks

PDE-based optimization methods are prolific in areas such as:

Sparse reconstruction, compressive sensing: Acceleration and improved recovery in $\ell_1$ regularized problems via nonlinear motion PDE plug-ins (Mao et al., 2011).
Inverse problems (e.g., tomography, parameter estimation): Penalty methods have demonstrated comparable accuracy with increased robustness to initial guesses and nonlinearity, as in seismic and resistivity inversion (Leeuwen et al., 2015).
Control and real-time NMPC: Double-layer Jacobi methods exploit intrinsic temporal and spatial sparsity, yielding computational complexity reductions by several orders of magnitude and enabling real-time embedded deployment (Deng et al., 2020).
Large-scale, ill-conditioned, nonsmooth optimization: Interior point and primal–dual methods with robust preconditioning and structure exploitation enable rapid solution of PDE-constrained problems with millions of unknowns (Pearson et al., 2018, Jensen et al., 2022, Hartland et al., 19 Oct 2024).
Mesh adaptation and shape control: Mesh quality and solution error are targeted synchronously, supporting arbitrary order and dimension in mesh optimization with adjoint-based sensitivities (Kolev et al., 2 Jul 2025).

Performance metrics generally include convergence rate, number of PDE solves (or inner iterations), wallclock time, error with respect to ground-truth or reference solution, and scalability (iteration counts/solver time versus problem size or mesh refinement). Studies consistently demonstrate that exploiting PDE structure, physical/geometric interpretation (such as mass preservation or Wasserstein flows), and advanced algorithmic techniques leads to significant resource savings and improved solution quality.

This synthesis captures the mathematical structure, algorithmic pillars, advanced differentiability techniques, major classes of algorithms, current machine learning augmentations, and key applications of PDE-based optimization methods, elucidating their central role and ongoing evolution in computational mathematics and engineering (Mao et al., 2011, Leeuwen et al., 2015, Paganini et al., 2017, Pearson et al., 2018, Götschel et al., 2019, Deng et al., 2020, Sirignano et al., 2021, Jensen et al., 2022, Liang et al., 2023, Li et al., 2023, Cheng et al., 16 Jun 2025, Kolev et al., 2 Jul 2025).