CT-MsgModMGN Neural Surrogate Model
- CT-MsgModMGN is a neural surrogate model architecture that integrates MeshGraphNet, a Control Transformer, and message modulation for cross-subject knee joint stress prediction.
- The model significantly reduces prediction error and mitigates peak-shaving by employing short-horizon history encoding to recover implicit phase information.
- By decoupling temporal encoding from spatial propagation via FiLM conditioning and state-conditioned modulation, it enhances localization of high-risk stress regions.
CT-MsgModMGN is a neural surrogate modeling architecture designed for cross-subject prediction of knee joint contact mechanics from finite element (FE) simulations. Integrating a shared MeshGraphNet (MGN) backbone with a Control Transformer (CT) for short-horizon history encoding and a message-modulation pathway (MsgMod) for adaptive spatial propagation, the model aims to disentangle and evaluate the contributions of temporal history and spatial propagation dependencies in surrogate prediction. Empirical findings demonstrate that only short-horizon history encoding significantly reduces prediction error, mitigates peak-shaving, and enhances localization of high-risk stress regions across unseen biomechanical subjects (Pan et al., 13 Jan 2026).
1. MeshGraphNet Backbone Design
The backbone represents each FE mesh as a static, undirected graph , where nodes correspond to mesh tetrahedra and edges connect spatially adjacent elements. Each node feature vector comprises centroid coordinates (3D) and the joint driver state (global pose in sine/cosine encoding and joint reaction forces). Edge features are defined by relative displacements .
Node and edge embedding is performed by two-layer MLPs mapping inputs to a latent space of dimension :
Processor operates for message-passing steps. At each step :
- Edge update:
- Message aggregation:
- Node update: where and are two-layer MLPs (shared across ).
Decoder predicts von Mises stress at each node using a two-layer MLP: . Stress targets are log-transformed and Z-score normalized.
2. Control Transformer for Temporal History
Single-frame driver inputs lack phase information, which is crucial for stress prediction in dynamic tasks. The CT module addresses this by encoding a short-horizon sequence of drivers (). Each driver vector is linearly embedded and combined with positional encoding.
The CT consists of a 2-layer Transformer encoder (4 heads) producing output over the sequence. The context vector for the current graph is obtained by mean-pooling:
The CT module recovers implicit phase progression absent from instantaneous pose/load descriptions, enabling improved stress phase localization and correction of systematic underestimation of peak stress ("peak-shaving").
3. State-Conditioned Message Modulation (MsgMod)
MsgMod enables adaptive message passing by modulating edge-wise propagation based on the instantaneous encoded state. The gating signal is generated by a two-layer MLP with sigmoid output:
For each edge, messages are modulated as:
Aggregated and processed analogously to standard message passing. In CT-MsgModMGN, is defined by the CT output; in ablations (MsgModMGN), by the driver embedding.
4. CT-MsgModMGN Model Pipeline
The full CT-MsgModMGN workflow for each time step comprises:
- Input: Current driver and short-horizon sequence
- History encoding: from CT module
- FiLM conditioning on node states: Node latent updates modulated as , with
- MsgMod gating on edge messages: modulates edge messages
- Shared encoder, processor, and decoder as described previously.
The two conditioning mechanisms operate in parallel, with FiLM acting at the node level and MsgMod modulating inter-node communication, both based on the temporal context .
5. Experimental Data and Cross-Validation
The dataset comprises nine healthy male runners, each with three stance-phase trials, processed via OpenSim-FEBio and quasi-static analysis to yield 27 stance-phase FE simulations, sampled at 0.01s (each 12,000 nodes). Grouped 3-fold cross-validation is performed at the subject level:
- Fold 1: Train P4–P9, Test P1–P3
- Fold 2: Train P1–P3, P7–P9, Test P4–P6
- Fold 3: Train P1–P6, Test P7–P9
This strictly prevents subject leakage, isolating generalization to unseen subjects.
6. Quantitative Performance and Metrics
The models are evaluated on full-field error and hotspot localization using several metrics (all after inverse normalization):
| Model | RMSE | MAE | Pearson r | nRMSE | RE_max () | RE () | Dice | IoU |
|---|---|---|---|---|---|---|---|---|
| MGN | 0.60±0.15 | 0.25±0.06 | 0.68±0.11 | 0.65±0.12 | 0.95±0.20 | 0.85±0.18 | 0.48±0.06 | 0.33±0.05 |
| MsgModMGN | 0.56±0.11 | 0.23±0.05 | 0.71±0.09 | 0.62±0.10 | 0.88±0.18 | 0.79±0.15 | 0.50±0.05 | 0.35±0.04 |
| CT-MGN | 0.37±0.08* | 0.12±0.03* | 0.88±0.06* | 0.38±0.08* | 0.45±0.12* | 0.37±0.10* | 0.71±0.04* | 0.56±0.03* |
| CT-MsgModMGN | 0.42±0.10 | 0.15±0.04 | 0.85±0.07 | 0.44±0.09 | 0.48±0.13* | 0.40±0.11* | 0.69±0.05* | 0.53±0.04* |
- indicates vs. MGN and MsgModMGN.
Key observations:
- Both CT-MGN and CT-MsgModMGN reduce MAE and RMSE by half compared to MGN.
- MsgMod confers no significant benefit alone.
- Effect size for peak error (RE_max, RE) and spatial overlap (Dice, IoU) is largest when CT is present.
- Pearson improves from (MGN) to (CT variance).
- Non-CT models exhibit pronounced mid-stance error fluctuations; CT models maintain stable accuracy over stance.
7. Interpretations and Implications
History encoding via the CT module is the principal determinant of surrogate accuracy—not spatial propagation modulation. Encoding (recent short-horizon driver history) restores implicit phase information critical for precise peak-stress and hotspot localization, directly addressing the “peak-shaving” defect observed in prior deep surrogate models. MsgMod augmentation yields no additional improvement over CT-MGN, suggesting the fixed-topology MGN already captures spatial propagation patterns in this regime; thus, temporal history is the dominant source of uncertainty.
A plausible implication is that for cross-subject generalization in biomechanics driven by restricted pose/load spaces, temporal context is essential, while adaptive spatial gating confers limited benefit once robust history representation is available.
CT-MsgModMGN establishes a rigorous framework for decoupling and analyzing temporal versus spatial dependencies in graph-based surrogates under grouped subject-level generalization, with data-driven conclusions that inform neural surrogate design for dynamic biomechanical systems (Pan et al., 13 Jan 2026).