Unclear causes of performance decline when replacing CNN blocks with ViT in FoundationGait
Determine the causes of the observed performance decline when substituting the final two convolutional backbone blocks of FoundationGait with a 12-layer Vision Transformer module, in the self-supervised pretraining and evaluation setting described for FoundationGait-0.03B.
Sponsor
References
The reasons for this decline remain unclear and are not specific to FoundationGait.
— Silhouette-based Gait Foundation Model
(2512.00691 - Ye et al., 30 Nov 2025) in Ablation Study, ViT Replacement