PDE-Based Optimization Methods

Updated 8 December 2025

PDE-based optimization methods are numerical techniques that integrate PDE constraints into optimization problems, enabling efficient design, control, and parameter inference.
They employ methods such as adjoint-based derivatives, penalty and interior-point schemes, and natural gradient descent to tackle high-dimensional challenges.
Applications span inverse problems, optimal control, and mesh optimization, often enhanced by machine learning surrogates for faster convergence.

Partial Differential Equation (PDE)-based optimization methods are a class of numerical algorithms that address optimization problems in which the feasible set or objective is constrained by PDEs. Such methods are central in computational science and engineering disciplines where design, control, or parameter inference problems are governed by physics-based PDE models. The mathematical challenge lies in handling the intimate coupling between high-dimensional state variables (solutions to PDEs), control or design variables, and potentially, additional constraints such as regularity, sparsity, or physical bounds. Modern research in PDE-based optimization spans rigorous first-order optimality approaches, scalable solvers (e.g., penalty, interior-point, domain decomposition, and primal-dual schemes), natural gradient frameworks, adjoint and projection methods, and the integration of machine learning surrogates to accelerate or regularize the optimization loop.

1. Formulation and Structural Classes

A generic PDE-constrained optimization problem seeks variables $u$ (state), $m$ (control/parameter), and potentially the domain $\Omega$ to minimize an objective $J(u,m)$ subject to PDE constraints and other conditions:

$\min_{m,u} \; J(u, m) \quad \text{subject to} \quad A(m)u = q, \quad \text{and possibly} \quad m_\ell \le m \le m_u, \quad g(u) = 0$

Here, $A(m)$ encodes a discretized, possibly nonlinear and parameter-dependent PDE operator; $q$ is the source or right-hand side; $g(u) = 0$ may encode additional state constraints.

Key variants include:

Reduced vs all-at-once approaches: The reduced approach eliminates $u$ by solving the forward PDE at each iterate ( $u = G(m)$ ), reducing the problem to an optimization over $m$ (Leeuwen et al., 2015). The all-at-once (simultaneous) approach solves for $(u, m)$ jointly via a (typically large) Karush-Kuhn-Tucker (KKT) system.
Quadratic penalty and barrier reformulations: Constraints are enforced via quadratic penalties (soft constraints) or log-barrier terms (e.g., for bound/state constraints), enabling unconstrained or interior-point optimization techniques (Leeuwen et al., 2015, Hartland et al., 19 Oct 2024, Pearson et al., 2018).
Adjoint-based and Lagrangian derivatives: Gradients are computed efficiently by solving adjoint PDEs using the derivative of the Lagrangian with respect to model parameters, controls, or shape (Matharu et al., 2023, Blauth, 2022).

2. Algorithmic Workflows and Computational Architecture

Several algorithmic frameworks are foundational for PDE-based optimization:

a. Penalty and Variable Projection Methods

The quadratic penalty method introduces a penalized objective:

$J_{\text{pen}}(m,u) = \frac12\|Pu - d\|^2 + \frac{\rho}{2}\|A(m)u - q\|^2,$

where $P$ is the sampling/observation operator and $\rho > 0$ is the penalty weight. Eliminating $u$ yields a reduced problem $\phi_\rho(m)$ with a gradient computable via the variable projection approach; for large-scale problems this achieves the storage and computational complexity of the reduced method while introducing a larger search space and reduced nonlinearity, conferring better convergence properties and reduced sensitivity to poor initializations (Leeuwen et al., 2015).

b. Interior-Point and Preconditioned Saddle Point Methods

For PDE-constrained problems with bound or sparsity constraints, full-space interior-point Gauss-Newton (IP-GN) methods are effective. These introduce log-barriers for control and state bounds, yielding augmented Lagrangian/KKT systems. Robust, mesh-independent and IP-robust preconditioners, notably block Gauss-Seidel for the indefinite IP-GN linear system and log-barrier-regularized Schur complement preconditioners, enable efficient Krylov subspace solves, scaling up to $10^8$ unknowns on HPC resources (Hartland et al., 19 Oct 2024, Pearson et al., 2018).

c. Primal-Dual and First-Order Methods with Interleaved PDE Solves

Recent methods interleave low-complexity primal-dual optimization with “one-shot” (single-step) linearized PDE solves, rather than requiring full forward/adjoint solves or factorizations at each iteration. In each outer iteration, a Jacobi, Gauss-Seidel, or conjugate-gradient step is applied to the PDE block, preserving spatial sparsity and delivering linear convergence under growth conditions. This approach cuts wall-clock time by up to $2\times$ over direct solvers in large inverse problems (Jensen et al., 2022).

d. Domain Decomposition and Parallel-in-Time

For time-dependent (parabolic) PDE-constrained optimization, Schur-type and Schwarz domain decomposition methods partition the time interval and iteratively solve coupled forward-backward subproblems with continuity conditions at subinterval boundaries. Multilevel variants with coarse space corrections enable iteration counts independent of the number of time subdomains, facilitating time-parallel scalability (Liu et al., 2016). Parallel Full Approximation Scheme in Space and Time (PFASST) enables time-parallel adjoint-based optimization for parabolic problems, garnering speedups up to $5$– $8\times$ on moderate core counts (Götschel et al., 2019).

e. Natural Gradient Descent and Metric-Aware Optimization

Natural gradient descent methods lift standard gradient descent to Riemannian parameter manifolds with problem-adapted inner products (Euclidean, Sobolev, Fisher–Rao, Wasserstein-2). The natural gradient solves a least-squares problem in the parameter tangent space, leveraging the solution geometry; matrix-free linear algebra, including implicit Jacobian techniques, enables scalability to tens of thousands of dimensions. Sobolev and Wasserstein metrics enhance convergence, particularly for nonconvex and transport-dominated inverse problems (Nurbekyan et al., 2022).

3. Discretization, Regularization, and Loss Conditioning

Sophisticated discretization and regularization strategies underpin robust PDE-based optimization:

Spectral and Galerkin Discretization: High-order spectral methods discretize states and controls into basis expansions, leading to ODE-constrained optimizations of moderate dimensionality with superior per-mode accuracy, as in microfluidics mixing optimization (Song, 2022).
Automatic Regularization Parameter Selection: Automatic selection of Tikhonov regularization parameters can be achieved via regula falsi or secant updates in sub- or full-space Newton-Krylov iterations, drastically reducing the number of required PDE solves compared to exhaustive L-curve sweeps (Schenkels et al., 2018).
Conditioning-Aware Loss Functions: Optimization-based PDE solvers (e.g. PINNs, ODIL) often suffer slow convergence due to the implicit squaring of the condition number when minimizing mean squared error (MSE) losses. The SGR (Stabilized Gradient Residual) loss interpolates between residual descent and the MSE, reducing the effective condition number and accelerating convergence by orders of magnitude in both direct and PINN-based solvers (Cao et al., 24 Jul 2025).

4. Treatment of State and Geometric Constraints

Imposing general state or geometric constraints is a central aspect in PDE-based optimization:

State Constraints: Adjoint-based projection methods enable enforcement of state constraints by computing the gradient of the reduced cost and projecting onto the constraint manifold’s tangent space via secondary adjoint PDEs. This approach maintains the optimize-then-discretize paradigm and only requires one extra adjoint solve per constraint (Matharu et al., 2023).
Shape and Mesh Optimization: PDE-constrained mesh and shape optimization methods frame mesh movement or domain shape as an optimization over the boundary or nodal positions, constrained by the PDE and geometric validity. The Target-Matrix Optimization Paradigm (TMOP) delivers convex quality metrics, while adjoint sensitivity analysis, filtered regularization (via Helmholtz-type PDEs), and robust optimization (e.g., Method of Moving Asymptotes) yield high-order meshes and shapes that enforce element validity and minimize PDE discretization errors (Kolev et al., 2 Jul 2025, Blauth, 2022).

5. Integration with Machine Learning Surrogates

Recent advances integrate neural operator surrogates and physics-informed neural networks (PINNs) to mitigate high computational costs in PDE-based optimization:

Self-Supervised Operator Networks: DeepONets (branch–trunk networks) or neural operators are trained in a physics-informed manner (by minimizing PDE residuals) to rapidly approximate solution operators for parametric PDEs, enabling direct gradient-based optimization in the surrogate. Applications include Poisson, heat transfer, and drag minimization; forward solve and optimization speedups of $10-100\times$ are observed (Wang et al., 2021).
Hybrid Optimization with Surrogate and Numerical Solvers: Hybrid approaches couple optimization steps through the neural operator with periodic numerical corrector steps. Reference neural operators, virtual-Fourier architecture for accurate derivative learning, and hybridization for out-of-distribution robustness combine to reduce the number of expensive PDE solves by $2/3$, while preserving convergence quality (Cheng et al., 16 Jun 2025).
Bi-level and Proxy Optimizer Techniques: In challenging nonlinear and nonconvex PDECO, bi-level formulations decouple physics enforcement (solved in inner PINN loop) from the outer design optimization, with hypergradients computed via Broyden-type quasi-Newton, enabling fast convergence and state-of-the-art performance (Hao et al., 2022). Proxy optimizers learn a control basis as an MLP, enforcing PDE feasibility and constraints via primal-dual penalties; this enables real-time optimal control via a single forward pass, with up to $10^4\times$ speedup over adjoint-based MPC (Guven et al., 29 Sep 2025).

6. Scalability, Efficiency, and Practical Implementations

Scalability: Algorithms based on mesh-independent preconditioners (block Gauss–Seidel, Schur complement with AMG), time-parallel and spatial-parallel domain decomposition, and matrix-free spectral element methods demonstrate robust scaling to billions of degrees of freedom and core counts up to $10^5$ (Hartland et al., 19 Oct 2024, Marin et al., 2018, Liu et al., 2016).
Software integration: Efficient implementations leverage solver libraries (e.g. PETSc/TAO), matrix-free tensor operators, adjoint integrators with checkpointing, and automatic differentiation to seamlessly handle state, gradient, and Hessian computations in time-dependent and nonlinear settings (Marin et al., 2018).

7. Applications and Problem Classes

PDE-based optimization methods are applied in:

Large-scale inverse problems (e.g., seismic, electrical resistivity, and flow tomography) (Leeuwen et al., 2015)
Optimal control (linear and non-linear parabolic, hyperbolic, and elliptic systems)
Shape and topology optimization in fluid dynamics, heat transfer, electromagnetics, and elasticity (Kolev et al., 2 Jul 2025, Song, 2022, Blauth, 2022)
Data-driven closure modeling, including turbulence and parameter inference under physical constraints (Matharu et al., 2023)
Real-time and embedded optimal control using model predictive control (NMPC) with PDE constraints (Deng et al., 2020, Guven et al., 29 Sep 2025).

In all applications, the interplay of discretization, constraint enforcement, scalable solvers, and (where relevant) learning-based surrogates determines the practical viability and accuracy of the optimization. The field continues to evolve with advances in preconditioning, hybrid algorithms, and the integration of machine learning for operator approximation, sensitivity analysis, and uncertainty quantification.