Papers
Topics
Authors
Recent
Search
2000 character limit reached

DDNM+ Variant: Domain-Decomposed Nonlinear ROM

Updated 25 May 2026
  • DDNM+ Variant is a domain-decomposed nonlinear-manifold reduced order model that uses shallow sparse autoencoders and hyper-reduction to approximate high-fidelity solutions of large-scale nonlinear PDEs.
  • It decomposes the full-order model into subdomains to allow parallel training and reduce parameter counts, achieving higher accuracy compared to traditional linear-subspace ROMs.
  • The method integrates Gaussian-test weak coupling and adaptive sparsity masks to ensure numerical stability while significantly reducing computational costs.

The DDNM+ Variant refers to a domain-decomposed, nonlinear-manifold reduced order model (ROM) that leverages shallow, sparse autoencoders and hyper-reduction to efficiently approximate high-fidelity solutions of large-scale problems, particularly nonlinear parametric PDEs. Proposed within the context of scientific machine learning and model reduction, DDNM+ addresses scalability and parallel efficiency by splitting the full-order model (FOM) into subdomains, training nonlinear manifold autoencoders for each, and coupling them via algebraic constraints. The methodology enables significant reductions in parameter count, parallelizes training, and attains superior accuracy compared to linear-subspace ROMs under similar computational budgets (Diaz et al., 2023).

1. Algebraic Domain-Decomposed Full-Order Model

The baseline for DDNM+ is the algebraic FOM residual equation: r(x;μ)=0,xRNx,  μDRNμr(x;\mu) = 0, \quad x \in \mathbb{R}^{N_x},\;\mu \in \mathcal D \subset \mathbb{R}^{N_\mu} which is decomposed into nΩn_\Omega subdomains, each with interior state xiΩx_i^\Omega and interface state xiΓx_i^\Gamma. The FOM is equivalently represented as a block system with global residual components ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu). Interface continuity constraints are imposed via signed-incidence matrices AiA_i, leading to the constrained least-squares system: minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 0 This formulation enforces continuity of shared degrees of freedom (DOFs) while enabling domain-decomposed solution strategies.

2. Construction of Subdomain Nonlinear-Manifold ROMs

Within each subdomain, DDNM+ seeks a low-dimensional nonlinear approximation for both interior and interface variables. Shallow autoencoders are employed:

  • Encoder: hiΩ:RNiΩRniΩh_i^\Omega:\mathbb{R}^{N_i^\Omega}\rightarrow\mathbb{R}^{n_i^\Omega}
  • Decoder: giΩ:RniΩRNiΩg_i^\Omega:\mathbb{R}^{n_i^\Omega}\rightarrow\mathbb{R}^{N_i^\Omega}

This yields the manifold approximation

xiΩgiΩ(ziΩ),xiΓgiΓ(ziΓ)x_i^\Omega \approx g_i^\Omega(z_i^\Omega),\quad x_i^\Gamma \approx g_i^\Gamma(z_i^\Gamma)

with latent variables nΩn_\Omega0 and nΩn_\Omega1, where nΩn_\Omega2. Insertion into the FOM with hyper-reduction (HR) yields the DD NM-ROM minimization: nΩn_\Omega3 where nΩn_\Omega4 is a HR sampling matrix, and nΩn_\Omega5 is a Gaussian test matrix that enforces weak coupling with reduced constraint dimensionality.

3. Hyper-Reduction for Nonlinear Residuals

To avoid nΩn_\Omega6 evaluation cost, hyper-reduction is applied via a gappy-POD (or DEIM-style) collocation approach. For each subdomain, a collateral basis nΩn_\Omega7 for subdomain residuals is constructed from snapshot POD; greedy algorithms select a sampling set of indices and construct the sampling matrix nΩn_\Omega8. The hyper-reduced subdomain residual is approximated by

nΩn_\Omega9

where xiΩx_i^\Omega0 extracts sampled components, xiΩx_i^\Omega1 rows are typically retained. This reduces the computational burden of evaluating and assembling the global nonlinear ROM.

4. Global Assembly and Solution Procedure

The DDNM+ system is formulated as a small nonlinear least-squares problem with equality constraints:

  • Lagrange multipliers xiΩx_i^\Omega2 enforce the weak coupling.
  • A symmetric positive-definite saddle-point system of size xiΩx_i^\Omega3 arises per Gauss–Newton iteration.
  • Local residuals and Jacobians are formed in parallel for each subdomain; assembly is direct.
  • The global coupling matrix xiΩx_i^\Omega4 is constructed for the interface.
  • Convergence is typically achieved in 5–10 Gauss–Newton iterations.

5. Shallow, Sparse Autoencoder Architectures

Each subdomain autoencoder consists of a single hidden layer:

  • Encoder: xiΩx_i^\Omega5
  • Decoder: xiΩx_i^\Omega6
  • Activation: xiΩx_i^\Omega7 (swish)
  • Hidden layer width: xiΩx_i^\Omega8–xiΩx_i^\Omega9 the latent dimension, targeting xiΓx_i^\Gamma0 reconstruction error

Sparsity is imposed via tri-banded masks xiΓx_i^\Gamma1, xiΓx_i^\Gamma2 on xiΓx_i^\Gamma3, xiΓx_i^\Gamma4, ensuring xiΓx_i^\Gamma5 connectivity. The offline training objective with L1 sparsity penalties (xiΓx_i^\Gamma6) is: xiΓx_i^\Gamma7 Training employs Adam optimizer, 2,000 epochs, batch size 32, with early stopping.

6. Parameter Scaling, Parallelism, and Cost

Let xiΓx_i^\Gamma8 denote the parameter count in a global NM-ROM of latent dimension xiΓx_i^\Gamma9. For ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)0 in the reference study, ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)1. Domain decomposition into ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)2 subdomains, each with ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)3, ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)4 (ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)5 DOF/subdomain), yields ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)6 per subdomain (an ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)7 reduction), with total ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)8 (ri(xiΩ,xiΓ;μ)r_i(x_i^\Omega, x_i^\Gamma; \mu)9 reduction). General scaling with AiA_i0 subdomains is

AiA_i1

Subdomain training is fully parallelizable and scales as AiA_i2 in time and memory.

7. Numerical Performance and Key Features

Results for the 2D steady-state Burgers' equation over a AiA_i3 mesh (AiA_i4 DOFs) demonstrate:

  • DDNM+ (NM-ROM) achieves AiA_i5one order-of-magnitude lower AiA_i6 errors than DD LS-ROM at the same ROM size.
  • Hyper-reduction yields AiA_i7–AiA_i8 speedup for DDNM+; LS-ROM(HR) is faster but less accurate.
ROM AiA_i9 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 00 DoF Error Speedup Error (HR) Speedup (HR)
LS-ROM 6 3 36 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 01 48.7 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 02 340.0
LS-ROM 16 8 96 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 03 18.3 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 04 280.4
NM-ROM 6 3 36 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 05 26.2 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 06 44.7
NM-ROM 16 8 96 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 07 13.9 minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 08 37.5

Key distinguishing elements denoted by “+” in DDNM+:

  • Gaussian‐test weak coupling (minxiΩ,xiΓ12i=1nΩri(xiΩ,xiΓ;μ)22s.t.i=1nΩAixiΓ=0\min_{x_i^\Omega,\,x_i^\Gamma} \frac12\sum_{i=1}^{n_\Omega}\|r_i(x_i^\Omega, x_i^\Gamma; \mu)\|_2^2 \quad \text{s.t.} \quad \sum_{i=1}^{n_\Omega} A_i x_i^\Gamma = 09) reducing constraint count without loss of accuracy.
  • Input/output sparsity masks for efficient subnet evaluation.
  • Adaptivity of autoencoder mask patterns to local features (e.g., shocks).
  • Parallel snapshot and training processes (multi-GPU compatibility).
  • Potential extensions: adaptive subdomain re-partitioning, hp-refinement, parareal DD in time.

Critical tradeoffs include accuracy versus speed—DDNM+ is slower than LS-ROM(HR) but attains higher accuracy at small ROM sizes. Parameter scaling enables shallower, faster-trained networks per subdomain but requires management of more models. Extensions under consideration include online adaptive hyper-reduction and enrichment for fully unsteady PDEs (Diaz et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DDNM+ Variant.