2000 character limit reached

Discontinuous Galerkin Neural Network

Updated 16 November 2025

DGNN is a hybrid method that integrates local discontinuous Galerkin finite element schemes with neural networks to solve PDEs exhibiting discontinuities and sharp gradients.
It leverages weak-form variational principles with DG residuals, interface flux conditions, and structured loss terms to ensure stability and accurate convergence.
DGNN architectures employ element-local neural subnetworks with progressive, supervised, or randomized training strategies to capture complex domain dynamics effectively.

The Discontinuous Galerkin Neural Network (DGNN) is a class of hybrid computational methodologies that integrate the local structure-preserving properties of Discontinuous Galerkin (DG) finite element schemes with the function approximation capabilities of neural networks. DGNN architectures have been proposed to address limitations of standard physics-informed neural networks (PINNs) and classical numerical methods, especially in the presence of singularities, sharp gradients, discontinuous solutions, and high-dimensional/complex domains. DGNN methodologies leverage the weak-form variational principles (often realized through DG residuals, interface flux conditions, and structure-preserving loss terms) to ensure solution stability and physical fidelity, while employing neural subnetworks locally (per mesh element or time block) and various forms of progressive, supervised, or randomized training protocols.

1. Mathematical and Algorithmic Foundation

DGNN methods typically discretize the domain Ω by a mesh $\mathcal{T}_h = \{K\}$ , with each mesh cell $K$ assigned a local neural network representation for the solution. The trial function space for $u_h$ is expressed as:

$\mathcal{N}_{\Omega_h} = \left\{ u : u|_K(x) = u^{(K)}_{NN}(x; \theta_K), \, K \in \mathcal{T}_h \right\}$

where $u^{(K)}_{NN}$ may be a shallow or deep neural network with trainable weights $\theta_K$ , localized to element $K$ .

The global DGNN weak form for a scalar conservation law $\partial_t u + \nabla \cdot f(u) = 0$ is assembled as: \begin{equation*} \int_{K} (u_h)_t \, v \, dx

\int_{K} f(u_h) \, v_x \, dx
\widehat{f}{K^+} v(x{K^{+}^-)}
\widehat{f}{K^-} v(x{K^-}⁺⁾ = 0 \end{equation*} where $v$ is a test function, and $\widehat{f}$ are numerical fluxes (e.g., Lax–Friedrichs).

General DGNN frameworks incorporate jump and boundary terms via additional loss functionals, enforcing interface continuity weakly and allowing for discontinuities between network blocks. For example, the Rankine–Hugoniot (RH) jump condition is imposed via: \begin{equation*} \mathcal{L}{\mathrm{RH}} = \frac{1}{|\mathcal{T} \times \Omega|} \iint \left| s [![u_h]!] - [![f(u_h)]!] \right|² \delta\Gamma(x, t) dx\,dt \end{equation*} for shock surfaces $\Gamma$ .

2. Neural Network Architectures and Temporal Decomposition

DGNN approaches utilize element-local or block-local neural networks. In time-dependent problems, a temporal progressive learning strategy is adopted (Shen et al., 22 Aug 2025), decomposing $[0,T]$ into $M$ intervals, each with its own network block $u_{\theta^{(k)}}(x,t)$ . Training proceeds sequentially over time blocks:

Previous weights $\theta^{(k-1)}$ are frozen.
Supervision of block $k$ employs pseudo-labels from block $k-1$ for causality and solution continuity: \begin{equation*} \mathcal{L}{\mathrm{sup}}^{(k)} = \frac{1}{N{\sup}} \sum_{i=1}^{N_{\sup}} |u_{\theta^{(k)}}(x_i, t_i) - u_{\theta^{{(k-1)}}(x_i,} t_i)|²
\left| \mathcal{F}(u_{\theta^{(k)}} - u_{\theta^{(k-1)}}) \right|_2² \end{equation*} where $\mathcal{F}$ denotes the frequency-domain consistency penalty.

Element-wise architectures in spatially adaptive schemes include variants such as single-layer randomized networks (LRNN-DG), discontinuous hybrid neural networks, and local plane-wave/Trefftz networks for oscillatory PDEs (Yuan et al., 11 Jun 2025, Yuan et al., 9 Nov 2025). Randomized hidden layers (fixed weights) with least-squares output-solving constitute one efficient subclass.

3. Structure-Preserving DGNN Losses

The total DGNN loss functional typically includes several weighted terms: \begin{equation*} \mathcal{L}^{(k)} = \omega_{\rm IC} \mathcal{L}{\rm IC} + \omega{\rm bdy} \mathcal{L}{\rm bdy} + \omega{\rm DG} \mathcal{L}{\rm DG} + \omega{\rm RH} \mathcal{L}{\rm RH} + \mathbf{1}{k > 1} \, \omega_{\rm sup} \mathcal{L}_{\rm sup}^{(k)} \end{equation*} where $\mathcal{L}_{\rm DG}$ quantifies the DG residual, $\mathcal{L}_{\rm RH}$ enforces the RH condition on shocks, $\mathcal{L}_{\rm IC}$ and $\mathcal{L}_{\rm bdy}$ enforce initial/boundary data, and $\mathcal{L}_{\rm sup}$ is the progressive temporal supervision.

DGNN implementations for elliptic, parabolic, hyperbolic, and wave equations adapt the loss to the specific variational weak form—using element-wise quadratures, numerical interface fluxes, explicit penalties, and sometimes collocation-based constraints or frequency-domain consistency terms.

4. Optimization, Training, and Implementation

Training strategies depend on the subclass:

For randomized basis methods (LRNN-DG, DGTNN), only output-layer coefficients are learned via global least-squares solvers; non-convex optimization is avoided (Sun et al., 30 Sep 2024).
In hybrid schemes (e.g., (Wang et al., 15 May 2025)), element-wise nonlinear parameters are updated by RMSprop or Adam, alternating with DG-based output weight solves.
Block-sequential temporal learning uses Adam followed by L-BFGS per block and stores interface data for forward training.
Adaptive domain decomposition employs a posteriori residual indicators to identify regions for mesh refinement, subdividing marked elements and reconstructing local networks as needed.

Boundary, interface, and initial conditions are imposed weakly via numerical fluxes, penalized terms, or collocation at selected points.

5. Convergence Analysis and Theoretical Guarantees

DGNN paradigms inherit stability and convergence rates from their DG foundation, augmented by universal approximation arguments for neural networks. Representative results:

Global $L^2$ error for temporally progressive DGNNs (Shen et al., 22 Aug 2025): \begin{equation*} |u_{\rm NN} - u_{\rm exact}|{L^2} \leq \sum{k=1}^M \left( \prod_{j=k+1}^M S_j \right) \left[ C_{\rm coer}^{(k)} \sqrt{\epsilon_{\rm opt}^{(k)} + \delta_{\rm approx}^{(k)}} \right] + C_{\rm smooth} h^{q+1} + C_{\rm disc} h^{1/2} + \delta_{\rm RH} \end{equation*} where $S_j$ are stability constants, $h$ the mesh size, $q$ the polynomial degree, and $\epsilon_{\rm opt}^{(k)}, \delta_{\rm approx}^{(k)}$ optimization/approximation errors.
Plane wave neural networks (DGPWNN, DGTNN) admit geometric error reduction independent of parameter bounds (Yuan et al., 11 Jun 2025, Yuan et al., 9 Nov 2025): \begin{equation*} |u - u_N| \leq |u| \left( \frac{2^{{3/2}\tau}{2-\tau}} \right)^N \end{equation*} where $N$ is the iteration count and $\tau$ a tuning parameter.
For randomized local networks, algebraic convergence in $h$ and exponential decay in neural width $M$ are typical, with local errors matching or exceeding classic DG for comparable degrees of freedom (Sun et al., 2022).

6. Numerical Results and Practical Benchmarks

DGNN methods have demonstrated strong empirical performance across a range of PDEs:

For hyperbolic conservation laws (Burgers, Euler), DGNNs outperform PINNs and first-order DG in capturing shocks and steep gradients— $L^2$ errors reduced by factors of 2–10, and solution profiles sharply capture shock speeds/magnitudes (Shen et al., 22 Aug 2025).
Space-time LRNN-DG achieves $L^2$ errors down to $10^{-6}$ for KdV-type solitons and Burgers equations, with adaptive meshes further reducing computation (Sun et al., 30 Sep 2024).
For elliptic and Helmholtz problems, DGNN/DGTNN yields $L^2$ errors $10^{-4}$ – $10^{-3}$ with an order of magnitude better robustness relative to PINN variants, especially in high-frequency regimes (Chen et al., 13 Mar 2025, Yuan et al., 9 Nov 2025).
Plane-wave neural network DG methods achieve exponential convergence in $N$ and effective hp-refinement for large wavenumber Helmholtz and Maxwell systems, covering 2D and 3D domains (Yuan et al., 11 Jun 2025).
DGNN-based convolutional networks can be trained in both supervised and unsupervised regimes, efficiently predicting DG degrees of freedom with mesh-independent inference costs (Celaya et al., 12 Feb 2025).

7. Extensions, Limitations, and Active Research Areas

DGNN frameworks are modular and parallel-friendly, naturally extending to:

Systems of PDEs, higher spatial dimensions, and complicated geometries via adaptive meshing, element-local subnetworks, and blockwise assembly.
Time-domain formulations with global-in-time solves that avoid error accumulation prevalent in time-marching schemes.
Characteristic-aligned and wavelet-activated bases yield improved stability/efficiency for advective or oscillatory problems.

Limitations pertain to problem-dependent tuning of random seeds, penalty parameters, and basis selection, as well as large global system sizes for high-order or 3D problems. Theoretical analysis of randomized bases, optimal penalty balancing between physics-informed terms and DG fluxes, and rigorous a priori error estimates remains an open field. Extensions to inhomogeneous and anisotropic media (e.g., via modulated plane-wave nets) are under exploration.

DGNN methodologies unify the stability, conservation, and adaptivity of classical DG with the approximation power and modularity of neural networks. This hybridization enables robust, accurate, and scalable solvers for a wide array of linear and nonlinear PDEs, especially for problems where traditional mesh-based or global neural solvers break down (Shen et al., 22 Aug 2025, Sun et al., 30 Sep 2024, Wang et al., 15 May 2025, Chen et al., 13 Mar 2025, Yuan et al., 11 Jun 2025, Yuan et al., 9 Nov 2025).