- The paper demonstrates that hybrid local-global deep surrogate models achieve top accuracy with full inputs and best preserve risk-relevant biomechanical features.
- It compares five distinct architectures (MGN, CT, Hi, GI, Hy) across ideal and degraded input conditions using metrics like RMSE, Dice coefficient, and hotspot localization error.
- The study highlights that noisy pose data notably degrades predictions, underscoring the need for context-specific surrogate model selection in clinical applications.
Problem Context and Motivation
The reconstruction of internal stress fields in the knee joint is essential for understanding mechanical risk factors in meniscal and cartilage injury, as well as the progression of osteoarthritis. Finite element analysis (FEA) provides high-fidelity solutions but remains impractical for large-scale or rapid clinical scenarios due to its computational and modeling burdens. Deep surrogate models, especially those leveraging graph-based representations on FEA meshes, have demonstrated high predictive power in mapping input kinematics and loading to internal stress fields. However, previous research has focused predominantly on conditions with high-quality, comprehensive input data—an unrealistic assumption in real-world or clinical settings where measurement noise and missing data are prevalent. Consequently, there remains a critical gap in understanding model robustness and the preservation of risk-relevant biomechanical information under input-limited constraints.
Experimental Design
The study systematically evaluates five deep surrogate architectures across four input conditions reflecting both ideal and practical, degraded scenarios. Data were obtained from nine male soccer players performing 90-degree change-of-direction maneuvers and processed with a pipeline combining motion capture, musculoskeletal modeling in OpenSim, and quasi-static FEA using a generic OpenKnee(s) model. Each finite element simulation time step was converted into a graph where nodes represent element centroids with attached local and global features. The architectures represent the following distinct paradigm classes:
- MGN: Local topology diffusion via standard MeshGraphNet-like architecture (local message passing),
- CT: History-context enhancement (incorporating short-term past input context),
- Hi: Hierarchical multi-scale modeling (coarse-to-fine information pooling and propagation),
- GI: Explicit global interaction (direct cross-graph global communication),
- Hy: Hybrid local-global fusion (simultaneous local diffusion and explicit long-range coupling).
All models employed a common encoder-propagation-decoder macroarchitecture, identical backbone dimensions, and propagation depth to isolate the effects of inductive bias and propagation mechanism.
Input regimes included:
- Full: Complete, noise-free kinematics and loading,
- Pose-corrupted: Additive Gaussian noise to pose inputs,
- Load-corrupted: Additive Gaussian noise to loading inputs,
- Minimal: Only pose input retained; all load input channels set to zero.
Key evaluation metrics were full-field RMSE and MAE, Pearson r, relative high-stress errors (REmax​, REP95​), Dice coefficient for high-risk region overlap, and hotspot localization error (distance between centers of true and predicted high-risk regions). Statistical rigor was ensured through three-fold cross-subject validation, within-fold temporal validation splits, and nonparametric significance testing with Holm-Bonferroni adjustment.
Main Results
With complete pose and load input, the Hy (local-global hybrid) model yielded the best numerical performance across almost all metrics:
- Highest accuracy: Lowest RMSE and MAE,
- Superior risk-area reconstruction: Highest Dice coefficients, lowest REmax​ and REP95​,
- Top spatial correlation: Largest Pearson r,
- Most reliable hotspot localization: Lowest mean localization error.
GI (explicit global interaction) model consistently underperformed, particularly in absolute and high-stress error metrics. MGN (MeshGraphNet baseline), CT (history-context), and Hi (hierarchical) models occupied an intermediate band and did not surpass Hy.
Under Pose-corrupted and Load-corrupted scenarios, degradation in performance was evident across all architectures, but Hy maintained best-in-class robustness:
- Hy outperformed all other paradigms in RMSE, high-stress error, and risk-region Dice (p < 0.05).
- Deterioration from pose corruption was universally more severe than from load perturbation, suggesting surrogate model predictions are more sensitive to kinematic input quality.
- For most models, full-field error and risk-area mislocalization increased more acutely with pose noise, while load noise primarily affected stress magnitude metrics.
In the Minimal input regime (pose only), model rankings become task-dependent:
- CT (history-context) minimized overall and high-stress error,
- Hy (local-global hybrid) best preserved the spatial pattern of high-risk regions (Dice),
- Hi (hierarchical) showed lower hotspot localization error,
- No single model was dominant across all evaluation dimensions—highlighting that model selection should be context- and target-dependent under severely constrained input scenarios.
Temporal Error Patterns
Phase-specific analysis across the stance phase revealed:
- Minimal-input and pose-corrupted conditions induced particularly severe error increases and loss of high-risk region overlap, especially during late stance—the period most relevant to injury risk.
- Under load corruption, performance decline was relatively mild and concentrated toward late stance, suggesting pose input fidelity is the key limiting factor for temporal prediction stability.
Theoretical and Practical Implications
This work provides strong quantitative evidence supporting a paradigm shift in surrogate model evaluation for musculoskeletal mechanics—from a singular focus on aggregate error under ideal conditions to a multidimensional robustness framework targeting preservation of risk-relevant biomechanical features under input constraints. The explicit demonstration that hybrid local-global propagation is synergistic—offering enhanced robustness and risk information fidelity—aligns with recent findings in mechanics-informed GNNs [see Patrignani and Pinho, 2026].
Selection of surrogate model architecture should therefore be guided by the anticipated application scenario: if overall stress field fidelity is essential, local-global hybridization is preferable; if application emphasis is on magnitude metrics or spatial risk localization under heavily degraded or missing data, history-context or hierarchical architectures may be more suitable. Purely global architectures (GI) are systematically inferior within this context.
From a translational perspective, these insights are directly relevant for real-world deployment in clinical screening, in-field risk monitoring, or epidemiological studies where only partial kinematic data may be available and load estimation is often unreliable. Importantly, the task-dependent divergence of model performance underlines the risk of erroneous extrapolations from conventional benchmarking on full-data conditions.
Limitations and Future Directions
The main study limitations are a restricted, homogeneous sample (young male athletes, single task) and the use of a generic, non-subject-specific finite element geometry. The models were trained under unified settings and not specifically optimized for each input regime, meaning that further architectural adaptation or fine-tuning could provide additional robustness. Clinical-level outcome linkage and generalization to more diverse biomechanical tasks/environments remain open challenges for future work.
Conclusion
Comprehensive surrogate model evaluation in knee joint contact mechanics must integrate robustness and risk feature preservation under practical input constraints. Local-global hybridization provides superior all-around robustness, but under severely minimized inputs, paradigm suitability is application- and metric-dependent. Model assessment and selection frameworks should thus be multidimensional and target-aligned for real-world deployment in biomechanical risk stratification.
For a detailed account and full dataset, refer to "Evaluating Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions" (2604.01990).