Evaluating Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions

Published 2 Apr 2026 in q-bio.QM | (2604.01990v1)

Abstract: Background and Objective: Accurate surrogate modeling of knee joint contact mechanics is important for reconstructing stress distributions and identifying risk-relevant regions, yet the relative suitability of different modeling paradigms under practically relevant input-limited conditions remains unclear. Methods: Nine male soccer players performed 90° change-of-direction trials. Finite element simulations driven by subject-specific joint posture and reaction forces were converted into graph-structured samples. Five surrogate architectures representing local diffusion, history-context enhancement, hierarchical multi-scale modeling, explicit global interaction, and local-global hybridization were compared using three-fold cross-subject validation under full, pose-corrupted, load-corrupted, and minimal-input conditions. Performance was evaluated using full-field error, high-stress error, high-risk region overlap, and hotspot localization metrics. Results: The hybrid model achieved the best overall performance under full inputs and remained the most robust under pose- and load-corrupted conditions. Under minimal inputs, no single model dominated all metrics: the history-context model yielded lower overall and high-stress errors, the hybrid model better preserved high-risk region reconstruction, and the hierarchical model showed an advantage in hotspot localization. Conclusion: Evaluation of surrogate models for knee joint contact mechanics should shift from accuracy comparisons under ideal inputs to a comprehensive assessment of the preservation of risk-relevant information under realistic input constraints. Although the local-global hybrid model showed the best overall robustness, the optimal model under minimal-input conditions remained task-dependent.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that hybrid local-global deep surrogate models achieve top accuracy with full inputs and best preserve risk-relevant biomechanical features.
It compares five distinct architectures (MGN, CT, Hi, GI, Hy) across ideal and degraded input conditions using metrics like RMSE, Dice coefficient, and hotspot localization error.
The study highlights that noisy pose data notably degrades predictions, underscoring the need for context-specific surrogate model selection in clinical applications.

Evaluation of Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions

Problem Context and Motivation

The reconstruction of internal stress fields in the knee joint is essential for understanding mechanical risk factors in meniscal and cartilage injury, as well as the progression of osteoarthritis. Finite element analysis (FEA) provides high-fidelity solutions but remains impractical for large-scale or rapid clinical scenarios due to its computational and modeling burdens. Deep surrogate models, especially those leveraging graph-based representations on FEA meshes, have demonstrated high predictive power in mapping input kinematics and loading to internal stress fields. However, previous research has focused predominantly on conditions with high-quality, comprehensive input data—an unrealistic assumption in real-world or clinical settings where measurement noise and missing data are prevalent. Consequently, there remains a critical gap in understanding model robustness and the preservation of risk-relevant biomechanical information under input-limited constraints.

Experimental Design

The study systematically evaluates five deep surrogate architectures across four input conditions reflecting both ideal and practical, degraded scenarios. Data were obtained from nine male soccer players performing 90-degree change-of-direction maneuvers and processed with a pipeline combining motion capture, musculoskeletal modeling in OpenSim, and quasi-static FEA using a generic OpenKnee(s) model. Each finite element simulation time step was converted into a graph where nodes represent element centroids with attached local and global features. The architectures represent the following distinct paradigm classes:

MGN: Local topology diffusion via standard MeshGraphNet-like architecture (local message passing),
CT: History-context enhancement (incorporating short-term past input context),
Hi: Hierarchical multi-scale modeling (coarse-to-fine information pooling and propagation),
GI: Explicit global interaction (direct cross-graph global communication),
Hy: Hybrid local-global fusion (simultaneous local diffusion and explicit long-range coupling).

All models employed a common encoder-propagation-decoder macroarchitecture, identical backbone dimensions, and propagation depth to isolate the effects of inductive bias and propagation mechanism.

Input regimes included:

Full: Complete, noise-free kinematics and loading,
Pose-corrupted: Additive Gaussian noise to pose inputs,
Load-corrupted: Additive Gaussian noise to loading inputs,
Minimal: Only pose input retained; all load input channels set to zero.

Key evaluation metrics were full-field RMSE and MAE, Pearson r, relative high-stress errors ( $\text{RE}_{\text{max}}$ , $\text{RE}_{\text{P95}}$ ), Dice coefficient for high-risk region overlap, and hotspot localization error (distance between centers of true and predicted high-risk regions). Statistical rigor was ensured through three-fold cross-subject validation, within-fold temporal validation splits, and nonparametric significance testing with Holm-Bonferroni adjustment.

Main Results

Performance with Ideal Inputs

With complete pose and load input, the Hy (local-global hybrid) model yielded the best numerical performance across almost all metrics:

Highest accuracy: Lowest RMSE and MAE,
Superior risk-area reconstruction: Highest Dice coefficients, lowest $\text{RE}_{\text{max}}$ and $\text{RE}_{\text{P95}}$ ,
Top spatial correlation: Largest Pearson r,
Most reliable hotspot localization: Lowest mean localization error.

GI (explicit global interaction) model consistently underperformed, particularly in absolute and high-stress error metrics. MGN (MeshGraphNet baseline), CT (history-context), and Hi (hierarchical) models occupied an intermediate band and did not surpass Hy.

Robustness Under Input Degradation

Under Pose-corrupted and Load-corrupted scenarios, degradation in performance was evident across all architectures, but Hy maintained best-in-class robustness:

Hy outperformed all other paradigms in RMSE, high-stress error, and risk-region Dice (p < 0.05).
Deterioration from pose corruption was universally more severe than from load perturbation, suggesting surrogate model predictions are more sensitive to kinematic input quality.
For most models, full-field error and risk-area mislocalization increased more acutely with pose noise, while load noise primarily affected stress magnitude metrics.

Task-Conditional Model Advantage in Minimal-Input Setting

In the Minimal input regime (pose only), model rankings become task-dependent:

CT (history-context) minimized overall and high-stress error,
Hy (local-global hybrid) best preserved the spatial pattern of high-risk regions (Dice),
Hi (hierarchical) showed lower hotspot localization error,
No single model was dominant across all evaluation dimensions—highlighting that model selection should be context- and target-dependent under severely constrained input scenarios.

Temporal Error Patterns

Phase-specific analysis across the stance phase revealed:

Minimal-input and pose-corrupted conditions induced particularly severe error increases and loss of high-risk region overlap, especially during late stance—the period most relevant to injury risk.
Under load corruption, performance decline was relatively mild and concentrated toward late stance, suggesting pose input fidelity is the key limiting factor for temporal prediction stability.

Theoretical and Practical Implications

This work provides strong quantitative evidence supporting a paradigm shift in surrogate model evaluation for musculoskeletal mechanics—from a singular focus on aggregate error under ideal conditions to a multidimensional robustness framework targeting preservation of risk-relevant biomechanical features under input constraints. The explicit demonstration that hybrid local-global propagation is synergistic—offering enhanced robustness and risk information fidelity—aligns with recent findings in mechanics-informed GNNs [see Patrignani and Pinho, 2026].

Selection of surrogate model architecture should therefore be guided by the anticipated application scenario: if overall stress field fidelity is essential, local-global hybridization is preferable; if application emphasis is on magnitude metrics or spatial risk localization under heavily degraded or missing data, history-context or hierarchical architectures may be more suitable. Purely global architectures (GI) are systematically inferior within this context.

From a translational perspective, these insights are directly relevant for real-world deployment in clinical screening, in-field risk monitoring, or epidemiological studies where only partial kinematic data may be available and load estimation is often unreliable. Importantly, the task-dependent divergence of model performance underlines the risk of erroneous extrapolations from conventional benchmarking on full-data conditions.

Limitations and Future Directions

The main study limitations are a restricted, homogeneous sample (young male athletes, single task) and the use of a generic, non-subject-specific finite element geometry. The models were trained under unified settings and not specifically optimized for each input regime, meaning that further architectural adaptation or fine-tuning could provide additional robustness. Clinical-level outcome linkage and generalization to more diverse biomechanical tasks/environments remain open challenges for future work.

Conclusion

Comprehensive surrogate model evaluation in knee joint contact mechanics must integrate robustness and risk feature preservation under practical input constraints. Local-global hybridization provides superior all-around robustness, but under severely minimized inputs, paradigm suitability is application- and metric-dependent. Model assessment and selection frameworks should thus be multidimensional and target-aligned for real-world deployment in biomechanical risk stratification.

For a detailed account and full dataset, refer to "Evaluating Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions" (2604.01990).

Markdown Report Issue