ResUNet-Based DeepONet for Accelerated Simulation

Updated 19 December 2025

The paper demonstrates that ResUNet-based DeepONet achieves orders-of-magnitude speedup over conventional FEA by accurately predicting stress and temperature fields.
It leverages a novel fusion of a convolutional ResUNet trunk and a branching MLP to effectively couple spatial features with process parameters.
Empirical results show mean errors of 5.5%-9% and inference times under 0.01 seconds, supporting applications in topology optimization and additive manufacturing.

ResUNet-based DeepONet is a neural operator framework designed to map high-dimensional, variable, and complex geometry input fields, together with parametric process or load conditions, to full-field solution responses, such as stress or temperature, orders of magnitude faster than conventional finite element analysis (FEA). This approach combines the parametric flexibility and operator-learning capabilities of DeepONet with the spatial feature-encoding strength of convolutional residual U-Nets (ResUNet) in the trunk network, using Hadamard fusion in the latent space to couple process parameters and geometric context at an intermediate stage. As of 2023–2024, it is the first methodology to solve problems involving elasto-plastic material behavior, variable geometries from topology optimization, transient or parametric loading, and coupled multiphysics fields, with demonstrated application to structural mechanics and metal additive manufacturing (He et al., 2023, Kushwaha et al., 21 Mar 2024).

1. Mathematical Formulation and Operator Definition

The core objective is to learn a nonlinear solution operator $\mathcal{G}$ mapping an input field and process parameters to a target response field, typically written as: $\mathcal{G}: (a, y) \mapsto u(y)$ where $a$ encodes the input (e.g., geometry field $\rho(x)$ and physical parameters) and $y$ specifies the spatial location in the domain. DeepONet parameterizes $\mathcal{G}$ via a branch–trunk decomposition: $u(y) \approx \sum_{i=1}^p B_i(a)\, T_i(y)$ where $B_i(a)$ are outputs of the branch network encoding parameter/function input $a$ , and $T_i(y)$ are from the trunk network encoding $y$ , for a latent dimension $p$ .

In ResUNet-based DeepONet, the trunk does not explicitly take $y$ as input; rather, convolutional layers recover spatial context over a grid, so $T_{\text{trunk}}: \rho(x) \to F(x) \in \mathbb{R}^{H \times W \times n}$ for grid size $H \times W$ and channel number $n$ . Process parameters (loads, velocities, etc.) are encoded in the branch as a vector $b\in\mathbb{R}^n$ .

For multi-physics, multi-field outputs at each node (e.g., stress, temperature), the generalized form includes a field index and bias: $G_{bnc} = \sum_{h=1}^H B_{bch} T_{bnh} + \beta_c$ where $b$ samples, $n$ spatial points, $c$ field index, $h$ latent dimension (Kushwaha et al., 21 Mar 2024).

2. Network Architecture

Trunk Network—Residual U-Net

The trunk network implements a deep ResUNet, integrating four down-sampling and four up-sampling levels connected by skip connections. At each level $\ell$ , feature channels in the encoder double ( $d_0=32$ up to $d_4=512$ ), using max-pooling for resolution reduction. The decoder reverses this progression, halving channel depth while up-sampling via nearest neighbor or transpose convolution and concatenating matching-resolution skip features.

Each residual block consists of two $3\times3$ convolutions with batch normalization and ReLU activations, finalized by a skip connection: $\text{Output} = \text{ReLU}\left(\text{BN}(\text{Conv}_{3 \times 3, d}(\text{ReLU}(\text{BN}(\text{Conv}_{3 \times 3, d}(l_{\text{in}}))))) + l_{\text{in}}\right)$ Dropout (0.02) is applied after each block to regularize training.

The output is a spatial feature map $F(x)\in\mathbb{R}^{H\times W\times n}$ (commonly $128\times128\times32$ or $64\times64\times H$ ), representing a latent encoding of the geometry (He et al., 2023, Kushwaha et al., 21 Mar 2024).

Branch Network—Fully Connected (FC) Layers

The branch network encodes low-dimensional parameters such as load magnitude and direction or process conditions. A standard architecture is a 2-4 layer MLP, e.g.:

Input: 2 (or 1) $\rightarrow$ 64
Hidden: 64 $\rightarrow$ 64 (ReLU)
Hidden: 64 $\rightarrow$ 32 (ReLU)
Output: 32 $\rightarrow$ 32 (linear)

This yields a vector $b\in\mathbb{R}^n$ matching the trunk channel dimension.

3. Fusion Mechanism and Output Construction

ResUNet-based DeepONet employs Hadamard (elementwise) product fusion at the latent stage. Given trunk features $F(i,j,k)$ and branch output $b_k$ , fusion is: $F_{\text{fused}}(i,j,k) = F(i,j,k) b_k \quad \forall i, j, k$ The output is then collapsed along the channel dimension: $\sigma_{\text{pred}}(i,j) = \sum_{k=1}^n F_{\text{fused}}(i,j,k)$ This intermediate fusion mechanism allows the process parameter vector to modulate geometry-sensitive features throughout the spatial domain and enhances expressivity relative to simple output-level concatenation or late fusion. Early fusion at the bottleneck (before up-sampling in U-Net) can further boost parameter-conditioning of the decoder (He et al., 2023, Kushwaha et al., 21 Mar 2024).

4. Training Regimes and Data Generation

Training utilizes supervised loss (scaled or unscaled mean squared error, sometimes weighted) between the predicted and FEA-derived target fields. Data generation involves:

Geometry/topology: 2D density fields or masks generated via topology optimization (e.g., 1000–3000 structures on $64\times64$ or $128\times128$ grids).
FE simulations: Multiple parametric runs per geometry, e.g. varying displacements, velocities, process parameters.
Material models: Elasto-plastic (J $_2$ plasticity, isotropic hardening) or temperature-dependent visco-plastic constitutive laws.

Training uses Adam optimizer (typically $5\times10^{-4}$ initialization), batch sizes 8–16, 150k iterations, and hidden dimension $n=32$ (or $H=32$ ), with hardware timescales from $\sim$ 20–24 hours to less than 1500s on Nvidia A100 GPUs (He et al., 2023, Kushwaha et al., 21 Mar 2024).

5. Empirical Performance and Benchmarking

Quantitative performance evaluation demonstrates that ResUNet-based DeepONet achieves:

Mean relative L $_2$ error (MRL2E) for stress prediction: $\sim$ 8–9% ( $128\times128$ grid, elasto-plastic) and $\sim$ 5.5% ( $64\times64$ grid, AM residual stress).
MRL2E for temperature in AM: $\sim$ 3.14%.
Inference time per sample: $<10^{-2}$ s (GPU), representing $\sim10^3–10^4\times$ speedup over FEA (from $\sim$ 43–631 s per case).
Parameter count: $\sim3.6$ M (elasto-plastic) to $2.65$M (AM), comparable to standalone ResUNet but significantly outperforming vanilla FC-based DeepONet (He et al., 2023, Kushwaha et al., 21 Mar 2024).

Model	Params (M)	Mean MRL2E	Inference Time (s)
ResUNet (CNN)	3.57	0.0818	0.00090
Vanilla DeepONet (FC+FC)	3.51	0.2749	0.00003
ResUNet-DeepONet	3.58	0.0853	0.00079
Abaqus FEA	–	–	$\sim$ 43–631

The Vanilla DeepONet model (FC+FC trunk and branch) is unable to encode variable geometries, resulting in $\sim$ 27% MRL2E. ResUNet-based DeepONet approaches match the best CNNs in accuracy with improved memory efficiency and modularity.

6. Applicability, Strengths, and Limitations

ResUNet-based DeepONet is effective in scenarios requiring forward evaluation under broadly variable geometries and process conditions, such as:

Topology optimization loops
Sensitivity analysis
Uncertainty quantification
Real-time design feedback in additive manufacturing
Surrogate modeling for nonlinear elasto-plasticity and coupled multiphysics (He et al., 2023, Kushwaha et al., 21 Mar 2024)

Advantages:

Unified operator representation admits unseen geometry–parameter pairs at test time.
Convolutional trunk enables spatial invariance and accurate encoding of complex input domains.
Hadamard fusion delivers robust parameter–geometry coupling.
Inference efficiency enables orders-of-magnitude speedup over FEA, supporting high-throughput design tasks.

Limitations:

Prediction is restricted to the fixed grid resolution of the training set; extension to unstructured or higher-dimensional domains is nontrivial.
Only final-state fields are predicted; transient output requires RNN- or sequence-based branches.
Extension to full 3D or finer-scale modeling incurs significant computational and memory costs.

Potential extensions include incorporating recurrent branches (LSTM/GRU) for sequential or load-history input, enlarging the parameter set for more complex manufacturing processes, infusing physics-informed terms in loss functions, and scaling to graph-based or volumetric U-Nets for mesh-agnostic generalization.

The ResUNet-based DeepONet extends the classical DeepONet framework of Lu et al. (PNAS 2021) and leverages architecture concepts from ResUNet++ (Diakogiannis et al., IGARSS 2020) and fusion strategies inspired by Wang et al. (NeurIPS 2022). Empirically, it is the first reported operator-learning approach for full-field stress prediction on topology-optimized, variable geometries with nonlinear elasto-plastic constitutive laws (He et al., 2023). Subsequent work expands this paradigm to coupled thermo-mechanical fields in metal additive manufacturing and multiphysics workflows (Kushwaha et al., 21 Mar 2024). Published results have been reported using Abaqus-generated fields as ground-truth, using metrics such as relative $L_2$ and MAE, and validated across thousands of unseen test cases.

A plausible implication is that operator-learning surrogates using ResUNet trunks will enable new paradigms in simulation-based design, rapid digital twin updates, and real-time optimization for high-dimensional, parametrically rich engineering applications.