DDNM+ Variant: Domain-Decomposed Nonlinear ROM
- DDNM+ Variant is a domain-decomposed nonlinear-manifold reduced order model that uses shallow sparse autoencoders and hyper-reduction to approximate high-fidelity solutions of large-scale nonlinear PDEs.
- It decomposes the full-order model into subdomains to allow parallel training and reduce parameter counts, achieving higher accuracy compared to traditional linear-subspace ROMs.
- The method integrates Gaussian-test weak coupling and adaptive sparsity masks to ensure numerical stability while significantly reducing computational costs.
The DDNM+ Variant refers to a domain-decomposed, nonlinear-manifold reduced order model (ROM) that leverages shallow, sparse autoencoders and hyper-reduction to efficiently approximate high-fidelity solutions of large-scale problems, particularly nonlinear parametric PDEs. Proposed within the context of scientific machine learning and model reduction, DDNM+ addresses scalability and parallel efficiency by splitting the full-order model (FOM) into subdomains, training nonlinear manifold autoencoders for each, and coupling them via algebraic constraints. The methodology enables significant reductions in parameter count, parallelizes training, and attains superior accuracy compared to linear-subspace ROMs under similar computational budgets (Diaz et al., 2023).
1. Algebraic Domain-Decomposed Full-Order Model
The baseline for DDNM+ is the algebraic FOM residual equation: which is decomposed into subdomains, each with interior state and interface state . The FOM is equivalently represented as a block system with global residual components . Interface continuity constraints are imposed via signed-incidence matrices , leading to the constrained least-squares system: This formulation enforces continuity of shared degrees of freedom (DOFs) while enabling domain-decomposed solution strategies.
2. Construction of Subdomain Nonlinear-Manifold ROMs
Within each subdomain, DDNM+ seeks a low-dimensional nonlinear approximation for both interior and interface variables. Shallow autoencoders are employed:
- Encoder:
- Decoder:
This yields the manifold approximation
with latent variables 0 and 1, where 2. Insertion into the FOM with hyper-reduction (HR) yields the DD NM-ROM minimization: 3 where 4 is a HR sampling matrix, and 5 is a Gaussian test matrix that enforces weak coupling with reduced constraint dimensionality.
3. Hyper-Reduction for Nonlinear Residuals
To avoid 6 evaluation cost, hyper-reduction is applied via a gappy-POD (or DEIM-style) collocation approach. For each subdomain, a collateral basis 7 for subdomain residuals is constructed from snapshot POD; greedy algorithms select a sampling set of indices and construct the sampling matrix 8. The hyper-reduced subdomain residual is approximated by
9
where 0 extracts sampled components, 1 rows are typically retained. This reduces the computational burden of evaluating and assembling the global nonlinear ROM.
4. Global Assembly and Solution Procedure
The DDNM+ system is formulated as a small nonlinear least-squares problem with equality constraints:
- Lagrange multipliers 2 enforce the weak coupling.
- A symmetric positive-definite saddle-point system of size 3 arises per Gauss–Newton iteration.
- Local residuals and Jacobians are formed in parallel for each subdomain; assembly is direct.
- The global coupling matrix 4 is constructed for the interface.
- Convergence is typically achieved in 5–10 Gauss–Newton iterations.
5. Shallow, Sparse Autoencoder Architectures
Each subdomain autoencoder consists of a single hidden layer:
- Encoder: 5
- Decoder: 6
- Activation: 7 (swish)
- Hidden layer width: 8–9 the latent dimension, targeting 0 reconstruction error
Sparsity is imposed via tri-banded masks 1, 2 on 3, 4, ensuring 5 connectivity. The offline training objective with L1 sparsity penalties (6) is: 7 Training employs Adam optimizer, 2,000 epochs, batch size 32, with early stopping.
6. Parameter Scaling, Parallelism, and Cost
Let 8 denote the parameter count in a global NM-ROM of latent dimension 9. For 0 in the reference study, 1. Domain decomposition into 2 subdomains, each with 3, 4 (5 DOF/subdomain), yields 6 per subdomain (an 7 reduction), with total 8 (9 reduction). General scaling with 0 subdomains is
1
Subdomain training is fully parallelizable and scales as 2 in time and memory.
7. Numerical Performance and Key Features
Results for the 2D steady-state Burgers' equation over a 3 mesh (4 DOFs) demonstrate:
- DDNM+ (NM-ROM) achieves 5one order-of-magnitude lower 6 errors than DD LS-ROM at the same ROM size.
- Hyper-reduction yields 7–8 speedup for DDNM+; LS-ROM(HR) is faster but less accurate.
| ROM | 9 | 0 | DoF | Error | Speedup | Error (HR) | Speedup (HR) |
|---|---|---|---|---|---|---|---|
| LS-ROM | 6 | 3 | 36 | 1 | 48.7 | 2 | 340.0 |
| LS-ROM | 16 | 8 | 96 | 3 | 18.3 | 4 | 280.4 |
| NM-ROM | 6 | 3 | 36 | 5 | 26.2 | 6 | 44.7 |
| NM-ROM | 16 | 8 | 96 | 7 | 13.9 | 8 | 37.5 |
Key distinguishing elements denoted by “+” in DDNM+:
- Gaussian‐test weak coupling (9) reducing constraint count without loss of accuracy.
- Input/output sparsity masks for efficient subnet evaluation.
- Adaptivity of autoencoder mask patterns to local features (e.g., shocks).
- Parallel snapshot and training processes (multi-GPU compatibility).
- Potential extensions: adaptive subdomain re-partitioning, hp-refinement, parareal DD in time.
Critical tradeoffs include accuracy versus speed—DDNM+ is slower than LS-ROM(HR) but attains higher accuracy at small ROM sizes. Parameter scaling enables shallower, faster-trained networks per subdomain but requires management of more models. Extensions under consideration include online adaptive hyper-reduction and enrichment for fully unsteady PDEs (Diaz et al., 2023).