DDNM+ Variant: Domain-Decomposed Nonlinear ROM

Updated 25 May 2026

DDNM+ Variant is a domain-decomposed nonlinear-manifold reduced order model that uses shallow sparse autoencoders and hyper-reduction to approximate high-fidelity solutions of large-scale nonlinear PDEs.
It decomposes the full-order model into subdomains to allow parallel training and reduce parameter counts, achieving higher accuracy compared to traditional linear-subspace ROMs.
The method integrates Gaussian-test weak coupling and adaptive sparsity masks to ensure numerical stability while significantly reducing computational costs.

The DDNM+ Variant refers to a domain-decomposed, nonlinear-manifold reduced order model (ROM) that leverages shallow, sparse autoencoders and hyper-reduction to efficiently approximate high-fidelity solutions of large-scale problems, particularly nonlinear parametric PDEs. Proposed within the context of scientific machine learning and model reduction, DDNM+ addresses scalability and parallel efficiency by splitting the full-order model (FOM) into subdomains, training nonlinear manifold autoencoders for each, and coupling them via algebraic constraints. The methodology enables significant reductions in parameter count, parallelizes training, and attains superior accuracy compared to linear-subspace ROMs under similar computational budgets (Diaz et al., 2023).

1. Algebraic Domain-Decomposed Full-Order Model

The baseline for DDNM+ is the algebraic FOM residual equation: $r(x;\mu) = 0, \quad x \in \mathbb{R}^{N_x},\;\mu \in \mathcal D \subset \mathbb{R}^{N_\mu}$ which is decomposed into $n_\Omega$ subdomains, each with interior state $x_i^\Omega$ and interface state $x_i^\Gamma$ . The FOM is equivalently represented as a block system with global residual components $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ . Interface continuity constraints are imposed via signed-incidence matrices $A_i$ , leading to the constrained least-squares system: $\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ This formulation enforces continuity of shared degrees of freedom (DOFs) while enabling domain-decomposed solution strategies.

2. Construction of Subdomain Nonlinear-Manifold ROMs

Within each subdomain, DDNM+ seeks a low-dimensional nonlinear approximation for both interior and interface variables. Shallow autoencoders are employed:

Encoder: $h_i^\Omega:\mathbb{R}^{N_i^\Omega}\rightarrow\mathbb{R}^{n_i^\Omega}$
Decoder: $g_i^\Omega:\mathbb{R}^{n_i^\Omega}\rightarrow\mathbb{R}^{N_i^\Omega}$

This yields the manifold approximation

$x_i^\Omega \approx g_i^\Omega(z_i^\Omega),\quad x_i^\Gamma \approx g_i^\Gamma(z_i^\Gamma)$

with latent variables $n_\Omega$ 0 and $n_\Omega$ 1, where $n_\Omega$ 2. Insertion into the FOM with hyper-reduction (HR) yields the DD NM-ROM minimization: $n_\Omega$ 3 where $n_\Omega$ 4 is a HR sampling matrix, and $n_\Omega$ 5 is a Gaussian test matrix that enforces weak coupling with reduced constraint dimensionality.

3. Hyper-Reduction for Nonlinear Residuals

To avoid $n_\Omega$ 6 evaluation cost, hyper-reduction is applied via a gappy-POD (or DEIM-style) collocation approach. For each subdomain, a collateral basis $n_\Omega$ 7 for subdomain residuals is constructed from snapshot POD; greedy algorithms select a sampling set of indices and construct the sampling matrix $n_\Omega$ 8. The hyper-reduced subdomain residual is approximated by

$n_\Omega$ 9

where $x_i^\Omega$ 0 extracts sampled components, $x_i^\Omega$ 1 rows are typically retained. This reduces the computational burden of evaluating and assembling the global nonlinear ROM.

4. Global Assembly and Solution Procedure

The DDNM+ system is formulated as a small nonlinear least-squares problem with equality constraints:

Lagrange multipliers $x_i^\Omega$ 2 enforce the weak coupling.
A symmetric positive-definite saddle-point system of size $x_i^\Omega$ 3 arises per Gauss–Newton iteration.
Local residuals and Jacobians are formed in parallel for each subdomain; assembly is direct.
The global coupling matrix $x_i^\Omega$ 4 is constructed for the interface.
Convergence is typically achieved in 5–10 Gauss–Newton iterations.

5. Shallow, Sparse Autoencoder Architectures

Each subdomain autoencoder consists of a single hidden layer:

Encoder: $x_i^\Omega$ 5
Decoder: $x_i^\Omega$ 6
Activation: $x_i^\Omega$ 7 (swish)
Hidden layer width: $x_i^\Omega$ 8– $x_i^\Omega$ 9 the latent dimension, targeting $x_i^\Gamma$ 0 reconstruction error

Sparsity is imposed via tri-banded masks $x_i^\Gamma$ 1, $x_i^\Gamma$ 2 on $x_i^\Gamma$ 3, $x_i^\Gamma$ 4, ensuring $x_i^\Gamma$ 5 connectivity. The offline training objective with L1 sparsity penalties ( $x_i^\Gamma$ 6) is: $x_i^\Gamma$ 7 Training employs Adam optimizer, 2,000 epochs, batch size 32, with early stopping.

6. Parameter Scaling, Parallelism, and Cost

Let $x_i^\Gamma$ 8 denote the parameter count in a global NM-ROM of latent dimension $x_i^\Gamma$ 9. For $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 0 in the reference study, $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 1. Domain decomposition into $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 2 subdomains, each with $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 3, $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 4 ( $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 5 DOF/subdomain), yields $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 6 per subdomain (an $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 7 reduction), with total $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 8 ( $r_i(x_i^\Omega, x_i^\Gamma; \mu)$ 9 reduction). General scaling with $A_i$ 0 subdomains is

$A_i$ 1

Subdomain training is fully parallelizable and scales as $A_i$ 2 in time and memory.

7. Numerical Performance and Key Features

Results for the 2D steady-state Burgers' equation over a $A_i$ 3 mesh ( $A_i$ 4 DOFs) demonstrate:

DDNM+ (NM-ROM) achieves $A_i$ 5one order-of-magnitude lower $A_i$ 6 errors than DD LS-ROM at the same ROM size.
Hyper-reduction yields $A_i$ 7– $A_i$ 8 speedup for DDNM+; LS-ROM(HR) is faster but less accurate.

ROM	$A_i$ 9	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 0	DoF	Error	Speedup	Error (HR)	Speedup (HR)
LS-ROM	6	3	36	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 1	48.7	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 2	340.0
LS-ROM	16	8	96	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 3	18.3	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 4	280.4
NM-ROM	6	3	36	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 5	26.2	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 6	44.7
NM-ROM	16	8	96	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 7	13.9	$\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 8	37.5

Key distinguishing elements denoted by “+” in DDNM+:

Gaussian‐test weak coupling ( $\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0$ 9) reducing constraint count without loss of accuracy.
Input/output sparsity masks for efficient subnet evaluation.
Adaptivity of autoencoder mask patterns to local features (e.g., shocks).
Parallel snapshot and training processes (multi-GPU compatibility).
Potential extensions: adaptive subdomain re-partitioning, hp-refinement, parareal DD in time.

Critical tradeoffs include accuracy versus speed—DDNM+ is slower than LS-ROM(HR) but attains higher accuracy at small ROM sizes. Parameter scaling enables shallower, faster-trained networks per subdomain but requires management of more models. Extensions under consideration include online adaptive hyper-reduction and enrichment for fully unsteady PDEs (Diaz et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Nonlinear-manifold reduced order models with domain decomposition (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DDNM+ Variant.