Papers
Topics
Authors
Recent
Search
2000 character limit reached

CT-MsgModMGN Neural Surrogate Model

Updated 20 January 2026
  • CT-MsgModMGN is a neural surrogate model architecture that integrates MeshGraphNet, a Control Transformer, and message modulation for cross-subject knee joint stress prediction.
  • The model significantly reduces prediction error and mitigates peak-shaving by employing short-horizon history encoding to recover implicit phase information.
  • By decoupling temporal encoding from spatial propagation via FiLM conditioning and state-conditioned modulation, it enhances localization of high-risk stress regions.

CT-MsgModMGN is a neural surrogate modeling architecture designed for cross-subject prediction of knee joint contact mechanics from finite element (FE) simulations. Integrating a shared MeshGraphNet (MGN) backbone with a Control Transformer (CT) for short-horizon history encoding and a message-modulation pathway (MsgMod) for adaptive spatial propagation, the model aims to disentangle and evaluate the contributions of temporal history and spatial propagation dependencies in surrogate prediction. Empirical findings demonstrate that only short-horizon history encoding significantly reduces prediction error, mitigates peak-shaving, and enhances localization of high-risk stress regions across unseen biomechanical subjects (Pan et al., 13 Jan 2026).

1. MeshGraphNet Backbone Design

The backbone represents each FE mesh as a static, undirected graph G=(V,E)G=(V,E), where nodes iVi \in V correspond to mesh tetrahedra and edges (i,j)E(i,j)\in E connect spatially adjacent elements. Each node feature vector xiR10x_i \in \mathbb{R}^{10} comprises centroid coordinates (3D) and the joint driver state (global pose in sine/cosine encoding and joint reaction forces). Edge features are defined by relative displacements pij=pipjp_{ij}=p_i - p_j.

Node and edge embedding is performed by two-layer MLPs mapping inputs to a latent space of dimension Dhidden=48D_{\mathrm{hidden}}=48:

  • hi(0)=fencv(xi)h_i^{(0)} = f_{\mathrm{enc}}^{v}(x_i)
  • eij(0)=fence(pij)e_{ij}^{(0)} = f_{\mathrm{enc}}^{e}(p_{ij})

Processor operates for K=3K=3 message-passing steps. At each step kk:

  • Edge update: mij(k)=ϕe(hi(k),hj(k),eij(k))m_{ij}^{(k)} = \phi_e(h_i^{(k)}, h_j^{(k)}, e_{ij}^{(k)})
  • Message aggregation: mˉi(k)=1N(i)jN(i)mij(k)\bar m_i^{(k)} = \frac{1}{|N(i)|} \sum_{j\in N(i)} m_{ij}^{(k)}
  • Node update: hi(k+1)=hi(k)+ϕv(hi(k),mˉi(k))h_i^{(k+1)} = h_i^{(k)} + \phi_v(h_i^{(k)}, \bar m_i^{(k)}) where ϕe\phi_e and ϕv\phi_v are two-layer MLPs (shared across kk).

Decoder predicts von Mises stress at each node using a two-layer MLP: yi=fdec(hi(K))y_i = f_{\mathrm{dec}}(h_i^{(K)}). Stress targets are log-transformed and Z-score normalized.

2. Control Transformer for Temporal History

Single-frame driver inputs lack phase information, which is crucial for stress prediction in dynamic tasks. The CT module addresses this by encoding a short-horizon sequence Dt=[dtL+1,...,dt]D_t = [d_{t-L+1}, ..., d_t] of drivers (L=8L=8). Each driver vector dRDdd \in \mathbb{R}^{D_d} is linearly embedded and combined with positional encoding.

The CT consists of a 2-layer Transformer encoder (4 heads) producing output Z(T)Z^{(T)} over the sequence. The context vector CtR48C_t \in \mathbb{R}^{48} for the current graph is obtained by mean-pooling:

Ct=1Lk=1LZk(T)C_t = \frac{1}{L}\sum_{k=1}^L Z_k^{(T)}

The CT module recovers implicit phase progression absent from instantaneous pose/load descriptions, enabling improved stress phase localization and correction of systematic underestimation of peak stress ("peak-shaving").

3. State-Conditioned Message Modulation (MsgMod)

MsgMod enables adaptive message passing by modulating edge-wise propagation based on the instantaneous encoded state. The gating signal is generated by a two-layer MLP with sigmoid output:

st=ψ(Ct)R48s_t = \psi(C_t) \in \mathbb{R}^{48}

For each edge, messages are modulated as:

m~ij(k)=stmij(k)\tilde m_{ij}^{(k)} = s_t \odot m_{ij}^{(k)}

Aggregated and processed analogously to standard message passing. In CT-MsgModMGN, CtC_t is defined by the CT output; in ablations (MsgModMGN), by the driver embedding.

4. CT-MsgModMGN Model Pipeline

The full CT-MsgModMGN workflow for each time step comprises:

  • Input: Current driver dtd_t and short-horizon sequence DtD_t
  • History encoding: CtC_t from CT module
  • FiLM conditioning on node states: Node latent updates modulated as hi(k)γihi(k)+βih_i^{(k)} \leftarrow \gamma_i \odot h_i^{(k)} + \beta_i, with (γi,βi)=ϕFiLM(Ct)(\gamma_i, \beta_i) = \phi_{\mathrm{FiLM}}(C_t)
  • MsgMod gating on edge messages: st=ψ(Ct)s_t = \psi(C_t) modulates edge messages
  • Shared encoder, processor, and decoder as described previously.

The two conditioning mechanisms operate in parallel, with FiLM acting at the node level and MsgMod modulating inter-node communication, both based on the temporal context CtC_t.

5. Experimental Data and Cross-Validation

The dataset comprises nine healthy male runners, each with three stance-phase trials, processed via OpenSim-FEBio and quasi-static analysis to yield \sim27 stance-phase FE simulations, sampled at 0.01s (each \sim12,000 nodes). Grouped 3-fold cross-validation is performed at the subject level:

  • Fold 1: Train P4–P9, Test P1–P3
  • Fold 2: Train P1–P3, P7–P9, Test P4–P6
  • Fold 3: Train P1–P6, Test P7–P9

This strictly prevents subject leakage, isolating generalization to unseen subjects.

6. Quantitative Performance and Metrics

The models are evaluated on full-field error and hotspot localization using several metrics (all after inverse normalization):

Model RMSE MAE Pearson r nRMSE RE_max (×102\times 10^{-2}) RE95_{95} (×102\times 10^{-2}) Dice IoU
MGN 0.60±0.15 0.25±0.06 0.68±0.11 0.65±0.12 0.95±0.20 0.85±0.18 0.48±0.06 0.33±0.05
MsgModMGN 0.56±0.11 0.23±0.05 0.71±0.09 0.62±0.10 0.88±0.18 0.79±0.15 0.50±0.05 0.35±0.04
CT-MGN 0.37±0.08* 0.12±0.03* 0.88±0.06* 0.38±0.08* 0.45±0.12* 0.37±0.10* 0.71±0.04* 0.56±0.03*
CT-MsgModMGN 0.42±0.10 0.15±0.04 0.85±0.07 0.44±0.09 0.48±0.13* 0.40±0.11* 0.69±0.05* 0.53±0.04*
  • indicates p<0.05p<0.05 vs. MGN and MsgModMGN.

Key observations:

  • Both CT-MGN and CT-MsgModMGN reduce MAE and RMSE by half compared to MGN.
  • MsgMod confers no significant benefit alone.
  • Effect size for peak error (RE_max, RE95_{95}) and spatial overlap (Dice, IoU) is largest when CT is present.
  • Pearson rr improves from 0.7\sim 0.7 (MGN) to 0.88\sim 0.88 (CT variance).
  • Non-CT models exhibit pronounced mid-stance error fluctuations; CT models maintain stable accuracy over stance.

7. Interpretations and Implications

History encoding via the CT module is the principal determinant of surrogate accuracy—not spatial propagation modulation. Encoding DtD_t (recent short-horizon driver history) restores implicit phase information critical for precise peak-stress and hotspot localization, directly addressing the “peak-shaving” defect observed in prior deep surrogate models. MsgMod augmentation yields no additional improvement over CT-MGN, suggesting the fixed-topology MGN already captures spatial propagation patterns in this regime; thus, temporal history is the dominant source of uncertainty.

A plausible implication is that for cross-subject generalization in biomechanics driven by restricted pose/load spaces, temporal context is essential, while adaptive spatial gating confers limited benefit once robust history representation is available.

CT-MsgModMGN establishes a rigorous framework for decoupling and analyzing temporal versus spatial dependencies in graph-based surrogates under grouped subject-level generalization, with data-driven conclusions that inform neural surrogate design for dynamic biomechanical systems (Pan et al., 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CT-MsgModMGN.