ViT-based gait foundation model remains largely unexplored

Develop and evaluate a Vision Transformer-based gait foundation model for gait representation learning, and assess its effectiveness in bridging gait-specific foundation models with transformer-based models from other domains, particularly large language models.

Background

In the conclusion, the authors identify the need for a ViT-based gait foundation model to better connect the gait community’s models with transformer-dominant ecosystems, including LLMs. They explicitly state that this area remains largely unexplored.

Progress on this front would clarify whether transformer architectures can replace or complement CNN-based backbones in large-scale gait representation learning, enabling broader interoperability and potential multi-modal or cross-domain applications.

References

This issue is crucial for bridging gait-specific foundation models with those from other domains, particularly LLMs, yet it remains largely unexplored within the gait community.

Silhouette-based Gait Foundation Model (2512.00691 - Ye et al., 30 Nov 2025) in Conclusion and Limitations