Exploring Deep Models for Practical Gait Recognition (2303.03301v3)

Published 6 Mar 2023 in cs.CV

Abstract: Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.

Citations (21)

View on Semantic Scholar

Summary

The paper critiques shallow networks and introduces novel deep architectures for accurately capturing gait variations in uncontrolled environments.
It details the design of DeepGaitV2 using 3D CNNs and SwinGait leveraging Transformer attention to enhance temporal modeling.
Experimental results demonstrate state-of-the-art performance on diverse benchmarks, highlighting the practical impact of deep learning in gait recognition.

Deep Models for Practical Gait Recognition: An Expert Overview

This paper addresses the challenges associated with gait recognition in real-world settings, emphasizing the limitations of existing shallow models. The authors propose a shift towards deep learning architectures to improve the performance and applicability of gait recognition technologies. Their investigation encompasses both classical CNNs and novel Transformer-based models.

Core Contributions

Critique of Shallow Models: The paper highlights the inadequacy of shallow networks in handling the complexities of real-world gait datasets. Contrary to controlled environments, these datasets present diverse and biased distributions that demand more sophisticated modeling.
Proposal of Deep Architectures: Two novel series, DeepGaitV2 (CNN-based) and SwinGait (Transformer-based), are introduced. They are characterized by their ability to leverage deep networks, which demonstrate notable improvements on datasets such as Gait3D and GREW.
Temporal Modeling Advantages: The research stresses the importance of explicitly modeling temporal changes, showing that sequence-based methods outperform set-based approaches that disregard the order of frames.
Innovation with Transformers: SwinGait models explore the potential of Transformers in gait recognition. By addressing the challenge of 'dumb' patches—non-informative regions in gait silhouettes—these models significantly outperform previous efforts, especially in real-world applications.

Methodological Insights

Architectural Design: The DeepGaitV2 series employs a layered architecture using 3D convolutional units, designed to accommodate the inherent temporal nature of gait sequences. The Transformer-based SwinGait models use a hybrid approach, initially applying convolutional layers to mitigate the 'dumb' patch issue before leveraging the attention mechanism of Transformers.
Experimental Setups: The research is comprehensive, employing various benchmark datasets from both controlled (CASIA-B, OU-MVLP) and real-world (Gait3D, GREW) scenarios. This dual-faceted approach ensures that the proposed models are robust across different conditions.
Numerical Results: The DeepGaitV2 models achieve new state-of-the-art results on constrained datasets, while SwinGait models excel in real-world settings, demonstrating their versatility and effectiveness.

Implications and Future Directions

The findings highlight the importance of adopting deep learning architectures for gait recognition, particularly in unstructured and real-world environments. The proposed models not only improve accuracy but also provide a general framework for future researchers to build upon.

The implications of this work are significant for security and surveillance applications, where reliable and non-intrusive identification is critical. The use of Transformers suggests a promising direction for the evolution of gait recognition systems, with the potential for broader applications in dynamic and complex scenarios.

Future research may focus on further optimizing the architecture, exploring unsupervised learning to reduce the reliance on labeled data, and evaluating these models in real-time systems. Additionally, addressing the computational efficiency of such deep networks remains an important challenge for practical deployment.

PDF Markdown

Related Papers

GitHub

GitHub - ShiqiYu/OpenGait: A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait. (765 stars)

Tweets

https://twitter.com/jdchawla29/status/1851343342258299284