Neural Galerkin Method

Updated 2 January 2026

Neural Galerkin Method is a computational framework that melds neural network trial spaces with classical Galerkin projections for solving PDEs.
It employs varied neural architectures and classical finite element test functions to ensure stability and rigorous error control.
Adaptive sampling, randomized networks, and mixed formulations enhance its accuracy and efficiency in high-dimensional and complex applications.

The Neural Galerkin Method (NGM) is a computational framework that synthesizes the principles of Galerkin projection from computational mathematics with neural network regression to numerically solve partial differential equations (PDEs). NGM leverages deep or randomized neural representations for trial spaces, while utilizing polynomial or classical bases for test spaces. This hybrid variational paradigm aims to combine mesh-free high-dimensional representation power with the structure-preserving stability and error control mechanisms established by Galerkin theory.

1. Mathematical Foundation and Variational Formulation

Neural Galerkin transforms the classic variational equation

$\text{Find} \; u\in V \quad \text{such that} \quad a(u,v) = \ell(v) \quad \forall v\in W$

by substituting traditional trial spaces with neural network parametrizations. For linear elasticity on $\Omega\subset\mathbb{R}^d$ , the Petrov–Galerkin formulation uses a bilinear form

$a(u,w) = \int_\Omega [2\mu\,\varepsilon(u):\varepsilon(w) + \lambda\,(\nabla\cdot u)(\nabla\cdot w)]\,dx$

with the right-hand side

$\ell(w) = \int_\Omega f\cdot w\,dx + \int_{\Gamma_N} g_N\cdot w\,ds$

where $\varepsilon(u)=\frac{1}{2}(\nabla u + \nabla u^{T})$ and $u(x)$ is parametrized by a neural network $u_{NN}(x;\alpha)$ (Shang et al., 2023).

In evolution equations, the Dirac–Frenkel principle imposes time-dependent tangential orthogonality in parameter space,

$\langle \partial_{\theta_i} u, R_\theta\rangle = 0$

for every differentiable direction, yielding an ODE system for $\theta(t)$ (Bruna et al., 2022, Sun et al., 2023, Li et al., 25 Dec 2025).

2. Neural Trial Spaces and Test Function Design

NGM designates the solution ansatz via neural architectures:

Randomized Neural Networks: Hidden-layer weights/biases fixed by random draws; final layer trained via least squares (Shang et al., 2023).
Feedforward Deep Nets: Trainable weights, activations (typically tanh, Gaussian, or swish) with input expansions for higher dimensions (Li et al., 2020, Bruna et al., 2022).
Piecewise Neural Construction: Local element-wise neural networks for DG-based domains (Chen et al., 13 Mar 2025).
Quantum-Inspired Ansatz: Nonlinearly parameterized, normalized neural quantum states for value functions in high-dimensional Hamilton–Jacobi–Bellman equations (Sun et al., 2023, Sinibaldi et al., 2024).

Test spaces retain classical character—finite element polynomials, hat functions, or broken polynomial bases—preserving variational consistency and enabling mesh-local stabilization (Shang et al., 2023, Kumar et al., 13 Sep 2025).

3. Discrete System Assembly and Optimization

Neural Galerkin discretizes the parameter-dependent residual equations. For linear elasticity, the least-squares system is assembled:

$J(\alpha) = \sum_{i=1}^{N+N_b} |r_i(\alpha)|^2$

where $r_i(\alpha) = a(u_{NN}(\cdot;\alpha), v_i) - \ell(v_i)$ , and solved by direct linear algebra with respect to output-layer weights (Shang et al., 2023).

Time-dependent variants form ODEs for neural parameters:

$M(\theta)\, \dot{\theta} = F(\theta)$

with $M$ and $F$ estimated by (possibly adaptive) sampling and integrated via explicit/implicit schemes (Bruna et al., 2022, Wen et al., 2023, Li et al., 25 Dec 2025). For nonlinear, parametric, or quantum-inspired models, sampling is performed in the measure induced by the current neural solution (quantum Monte Carlo, Gibbs reweighting, or active learning) to minimize estimator variance (Wen et al., 2023, Sun et al., 2023).

4. Adaptive Sampling and Active Learning Strategies

Standard uniform Monte Carlo is ineffective for high-dimensional domains or localized solution features. Neural Galerkin integrates:

Density-guided Sampling: Prioritize regions where the network-approximated solution has high mass or variance (Bruna et al., 2022).
Particle-based Adaptive Sampling: Ensembles of particles adaptively driven by Langevin or Stein flows, concentrating computational effort where the residual is large (Wen et al., 2023).
Meta-learning and Decoders: Latent codes encode parametric initial conditions for rapid adaptation to unseen regimes, reducing the need for full retraining (Li et al., 25 Dec 2025).
Quadratic Manifold Collocation: For model reduction, separate collocation for residual evaluation and full-model grid points delivers hyper-reduction and online efficiency (Weder et al., 2024).

NGM’s adaptive strategies are central for tractable calibration in $d\gg1$ and for problems with sharp interface or boundary layers (Bruna et al., 2022, Kumar et al., 13 Sep 2025).

5. Mixed and Stabilized Formulations

To address locking and enforce physical symmetries (e.g., stress tensor symmetry in elasticity), mixed neural Galerkin methods introduce independent network approximations for coupled fields (e.g., $\sigma_{NN}(x;\alpha^\sigma)$ and $u_{NN}(x;\alpha^u)$ ), with stability enforced by appropriate polynomial test spaces (Shang et al., 2023).

PG-VPINN variants separate trial (neural) and test (classical hat) bases, optionally penalizing interface jumps to enhance wake stabilization and boundary layer resolution in singularly-perturbed BVPs (Kumar et al., 13 Sep 2025). Discontinuous meshwise neural architectures mirror classical DG by enforcing variational communication via penalty terms (Chen et al., 13 Mar 2025).

6. Error Control and Convergence

NGM admits rigorous a posteriori error control; for least-squares formulations, the energy norm gap is bounded by the maximized weak residual:

$\|u-u_N\|_{E} \leq \sup_{\|v\|_{E}=1} |r(u_{N};v)|$

with theoretical convergence up to geometric rates provided the network class is sufficiently expressive and enrichment proceeds by maximizing residuals in current directions (Ainsworth et al., 2021, Ainsworth et al., 2024).

For randomized neural trials, universal approximation guarantees yield high-probability arbitrarily small errors for large enough parameterizations. Mixed and stabilized NGM avoid numerical locking and oscillation, with variational principles ensuring stability independent of mesh or basis selection (Shang et al., 2023).

7. Applications and Benchmarking

Across domains, NGM achieves:

High-order accuracy (6–8 digits in $L^2$ ) with modest unknowns ( $10^2$ – $10^3$ DoF) in elasticity and Stokes flow (Shang et al., 2023, Li et al., 2020).
Robust error minimization in singularly perturbed and high-dimensional PDEs, outperforming PINN and classic FEs in smooth and low-regularity benchmarks (Chen et al., 2020, Bruna et al., 2022).
Locking-free performance in nearly-incompressible elasticity and improved stability in boundary layer/corner singularity problems (Shang et al., 2023, Ainsworth et al., 2024).
Orders-of-magnitude online efficiency in model reduction, with cost scaling independent of full state dimension for linear models (Weder et al., 2024).
Certification of global-in-time quantum dynamics trajectories with rigorous error bounds via variational loss minimization (Sinibaldi et al., 2024).

Neural Galerkin has found utility in stationary and time-dependent mechanics, high-dimensional kinetic and control equations, quantum many-body dynamics, parametric evolution problems, and nonlinear model reduction.

Neural Galerkin Method thus constitutes a versatile, rigorously founded computational paradigm for high-fidelity, mesh-adaptive, and dimension-agnostic PDE solution, bridging the high-order expressivity of neural architectures with the error-controlling stability of variational Galerkin frameworks (Shang et al., 2023, Bruna et al., 2022, Weder et al., 2024, Chen et al., 13 Mar 2025, Li et al., 25 Dec 2025).