Adaptive Adjoint-Oriented Neural Network

Updated 28 December 2025

Adaptive AONN is a deep learning framework for solving optimal control PDEs by integrating adjoint-based neural networks with adaptive sampling to tackle singularities.
It couples independent neural network surrogates for state, adjoint, and control with deep adaptive sampling strategies to focus computational resources on challenging regions.
Numerical results demonstrate significant error reduction and efficiency gains, outperforming traditional and non-adaptive methods in various control problems.

An Adaptive Adjoint-Oriented Neural Network (Adaptive AONN) is a deep learning-based framework for the efficient, mesh-adaptive solution of parametric optimal control problems constrained by partial differential equations (PDEs), especially in the presence of solution singularities or low-regularity behavior. It integrates the adjoint-oriented neural network (AONN) methodology for all-at-once approximation of state, adjoint, and control with deep adaptive sampling strategies to dynamically focus computational resources on difficult regions in the joint space of physical and parametric variables. The approach tightly couples the neural network surrogates with the variational KKT structure of the control problem, and automatically concentrates sampling near singularities and regions of large residual, resulting in pronounced gains in both accuracy and efficiency relative to baseline neural and traditional methods (Yuan et al., 21 Dec 2025).

1. Mathematical Formulation and Problem Setting

Adaptive AONN targets parametric optimal control problems of the form: for each $\mu\in\Gamma\subset\mathbb{R}^d$ , find state $y$ and control $u$ to minimize a target functional

$\min_{(y,u)\,\in\,Y\times U}\; J(y,u;\mu) = \frac12\|y-y_d\|_{L^2(\Omega(\mu))}^2 + \frac\alpha2\|u\|_{L^2(\Omega(\mu))}^2$

subject to

$\begin{cases} F(y(x;\mu),\,u(x;\mu);\mu) = 0, & x\in\Omega(\mu) \ y = \bar y \quad\text{on } \partial\Omega(\mu) \ u_a(x;\mu) \leq u(x;\mu) \leq u_b(x;\mu) \quad\text{a.e.} \end{cases}$

where the parameter $\mu$ may affect geometry, data, or coefficients, inducing either geometric or parametric singularities.

The Karush-Kuhn-Tucker (KKT) system for this problem reads: $\begin{aligned} & F(y^*, u^*;\mu) = 0, \ & J_y(y^*, u^*;\mu) - F_y^*(y^*, u^*;\mu)\,p^* = 0, \ & \langle d_u J(y^*, u^*;\mu), v-u^* \rangle \geq 0 \quad \forall v \in U_{ad}(\mu), \end{aligned}$ with $d_u J = J_u(y,u;\mu) - F_u^*(y,u;\mu) p$ . The optimal control $u^*$ is realized as a projection onto the admissible set.

2. Adjoint-Oriented Neural Network Framework

The AONN methodology represents the state $y(x;\mu)$ , adjoint $p(x;\mu)$ , and control $u(x;\mu)$ by independent fully-connected neural networks with shared parametric input $(x,\mu)$ : $y(x;\mu) \approx y_\theta(x,\mu),\quad p(x;\mu)\approx p_\theta(x,\mu),\quad u(x;\mu)\approx u_\theta(x,\mu).$ Dirichlet boundary conditions are directly enforced with an ansatz such as $y_\theta(x,\mu) = \bar y(x,\mu) + \ell(x,\mu)Y_I(x,\mu;\theta_y)$ , where $\ell$ vanishes on the boundary. The networks are trained to minimize mean-square strong-form residuals for the state equation (PDE), adjoint (derived from the optimality system), and variational inequality (for control projection), by constructing losses: $\begin{aligned} J_s(\theta) &= \int_{\Omega(\mu)\times\Gamma} |r_s(x,\mu;\theta)|^2\,dx\,d\mu, \ J_a(\theta) &= \int_{\Omega(\mu)\times\Gamma} |r_a(x,\mu;\theta)|^2\,dx\,d\mu, \ J_u(\theta) &= \int_{\Omega(\mu)\times\Gamma} |r_u(x,\mu;\theta)|^2\,dx\,d\mu, \end{aligned}$ with residuals $r_s$ , $r_a$ , $r_u$ defined by the PDE, adjoint, and control optimality constraints (Yuan et al., 21 Dec 2025).

Training proceeds in a direct-adjoint looping (DAL) style: each network (state, adjoint, control) is minimized in turn, using the current values of the others, tightly coupling them to reflect the KKT structure.

3. Deep Adaptive Sampling and Residual-Driven Focus

Standard neural approaches employing fixed random collocation tend to waste sampling in regions of high regularity and struggle in the vicinity of singularities or sharp layers. Adaptive AONN augments the basic AONN by integrating the $\mathrm{DAS}^2$ (deep adaptive sampling for surrogates without labels) strategy. Here, a normalizing flow (KRnet) is trained to approximate the probability density

$\hat r(x,\mu) \propto r_s(x,\mu;\theta)^2 + r_a(x,\mu;\theta)^2$

so that new training points are preferentially generated where the (state and adjoint) residuals are largest.

Sampling is performed by drawing from the learned KRnet density, rejecting points outside the feasible domain, and periodically updating the normalizing flow as the residual evolves throughout training. The focus on high-residual regions automatically increases sample density in low-regularity areas, leading to significant improvements in both efficiency and accuracy, particularly for problems with geometric or parametric singularities (Yuan et al., 21 Dec 2025).

4. Implementation Details

The implementation employs three fully-connected neural networks for state, adjoint, and control, each with 6 hidden layers of 25–32 neurons and $\tanh$ activations. The normalizing flow (KRnet) for adaptive sampling uses 2–3 loops of affine-coupling layers. AONN networks are optimized with BFGS in increasing epoch schedules, while KRnet is optimized with Adam; batch sizes of $2000$–$4000$ collocation points are typical.

Boundary and geometric constraints are enforced exactly through the functional ansatz rather than penalty terms. The DAL-style alternation between state, adjoint, and control is performed per outer adaptive iteration. The method is data-efficient, with residual-based sample selection substantially reducing total sample requirements for a given accuracy (Yuan et al., 21 Dec 2025).

In mesh-dependent settings, such as goal-oriented a posteriori error estimation, a variant is used wherein only the adjoint is represented by a neural network (to accelerate the DWR estimator), with the primal solved by finite elements; this yields cost savings on large meshes and flexible local enrichment (Roth et al., 2021).

5. Numerical Results and Empirical Performance

The adaptive AONN has been systematically validated on parametric PDE-constrained optimal control problems featuring geometric and parametric singularities. Key findings are:

Laplace problem with geometric singularity: On a domain with a shrinking hole, the relative $L^2$ -error in control compared to FEM reference is $0.6\%$ for adaptive AONN versus $2.4\%$ – $2.5\%$ for non-adaptive methods, with adaptive AONN achieving roughly an order-of-magnitude reduction in error for the same sample count. The majority of error reduction occurs in the initial adaptive iterations.
Stokes flow with boundary control: For a channel around a cylinder and parameter $\xi\in[10,1000]$ , adaptive AONN achieves uniform accuracy, with $L^2$ -errors in velocity and pressure (at $\xi=10$ ) of $7.3\%$ and $11.9\%$ respectively, substantially outperforming $\mathrm{DAS}^2$ ( $38.7\%$ , $62.9\%$ ) and classical AONN ( $44.5\%$ , $76.5\%$ ).
Laplace problem with $10$-dimensional parametric singularity: For solutions with sharp boundary layers controlled by Gaussian parametric inputs, at $2\times10^4$ samples adaptive AONN achieves absolute error $0.0627$, compared to $0.1483$ for $\mathrm{DAS}^2$ and $0.1622$ for AONN, with comparable or lower computational cost (Yuan et al., 21 Dec 2025).

A similar approach used for the mesh-based DWR framework, replacing the adjoint solve with a neural network, yielded wall-time advantages O($1$–$10$) for $N_\text{dof} > 10^4$ while maintaining effectivity indices near unity for linear goals and stable behavior in the nonlinear case (Roth et al., 2021).

6. Advantages, Limitations, and Applicability

Advantages:

Residual-driven adaptive sampling focuses computational effort and sampling density on singular and low-regularity regions, providing large efficiency gains and superior solution quality.
The AONN structure enables simultaneous, global (in parameter and spatial variables) approximation of optimal solutions, and the meshless formulation simplifies handling of complex geometries and moving domains.
The method is data-efficient and accommodates high-dimensional parameter spaces.

Limitations:

Requires explicit strong-form residuals for state/adjointequations; for multiphysics or variational inequality systems, such PDEs may be complex or unavailable.
Hyperparameter selection and training stability (saddle points, loss explosion) may pose practical challenges.
Error bounds are empirical; effectivity indices may drift if network residuals are not minimized to sufficient fidelity.
Integration of deep adaptive sampling mechanisms such as KRnet imposes additional overhead and complexity (Roth et al., 2021, Yuan et al., 21 Dec 2025).

The Adaptive AONN extends the original AONN described by Yin et al. (Yin et al., 2023) by incorporating sampling adaptivity for low-regularity and singular PDE solutions. Unlike classical mesh-based adjoint adaptivity (e.g., finite element DWR), the neural-based adjoint is meshless, does not require repeated linear solves, and scales efficiently for high-dimensional parameterized settings. In contrast to fully unsupervised deep surrogate models, Adaptive AONN leverages deep adaptive sampling targeted by the PDE/adjoint residual structure rather than error over outputs, yielding data-efficient surrogate construction. Its combination of DAL-style KKT enforcement, residual-driven data allocation, and meshless universal approximation is distinctive among parametric PDE control literature (Yin et al., 2023, Roth et al., 2021, Yuan et al., 21 Dec 2025).