GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (1811.06186v4)

Published 15 Nov 2018 in cs.CV

Abstract: As a unique biometric feature that can be recognized at a distance, gait has broad applications in crime prevention, forensic identification and social security. To portray a gait, existing gait recognition methods utilize either a gait template, where temporal information is hard to preserve, or a gait sequence, which must keep unnecessary sequential constraints and thus loses the flexibility of gait recognition. In this paper we present a novel perspective, where a gait is regarded as a set consisting of independent frames. We propose a new network named GaitSet to learn identity information from the set. Based on the set perspective, our method is immune to permutation of frames, and can naturally integrate frames from different videos which have been filmed under different scenarios, such as diverse viewing angles, different clothes/carrying conditions. Experiments show that under normal walking conditions, our single-model method achieves an average rank-1 accuracy of 95.0% on the CASIA-B gait dataset and an 87.1% accuracy on the OU-MVLP gait dataset. These results represent new state-of-the-art recognition accuracy. On various complex scenarios, our model exhibits a significant level of robustness. It achieves accuracies of 87.2% and 70.4% on CASIA-B under bag-carrying and coat-wearing walking conditions, respectively. These outperform the existing best methods by a large margin. The method presented can also achieve a satisfactory accuracy with a small number of frames in a test sample, e.g., 82.5% on CASIA-B with only 7 frames. The source code has been released at https://github.com/AbnerHqC/GaitSet.

Citations (484)

View on Semantic Scholar

Summary

The paper presents a novel set-based framework that treats gait as an unordered collection of silhouettes to enhance cross-view recognition.
It employs CNNs with permutation invariant Set Pooling, Horizontal Pyramid Mapping, and a Multilayer Global Pipeline to capture multi-scale spatial-temporal features.
Experiments on CASIA-B and OU-MVLP benchmarks show up to 95.0% rank-1 accuracy and robust performance under challenging conditions like bag-carrying and coat-wearing.

Overview of GaitSet: Gait Recognition as a Set-Based Problem

The paper "GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition" introduces a novel approach to gait recognition, leveraging the notion of treating gait sequences as sets. This reformulation addresses key limitations found in traditional template-based and sequence-based methods by discarding unnecessary sequential constraints and harnessing permutation invariance.

Methodology

The authors present GaitSet, a method that considers gait as an unordered set of silhouettes rather than a fixed sequence. This approach allows flexibility and robustness across variations in viewpoints and walking conditions. The primary innovation lies in utilizing a deep learning architecture that independently processes frame-level features using a CNN, followed by a Set Pooling operation. This pooling operation aggregates features into a set-level representation, preserving both spatial and temporal information effectively.

Key components of the GaitSet framework include:

Set Pooling (SP): Implements permutation invariant aggregation, with available strategies involving statistical operations or attention mechanisms to refine frame-level feature maps.
Horizontal Pyramid Mapping (HPM): Maps set-level features into a discriminative space, applying a pyramid pooling strategy over feature maps to capture multi-scale information.
Multilayer Global Pipeline (MGP): Introduces multiple layers of pooling to capture features at varied levels of granularity, enhancing the precision of spatial-temporal representation.

Experimental Results

The performance of GaitSet is evaluated on two prominent benchmarks, the CASIA-B and OU-MVLP datasets, demonstrating superior accuracy over existing methods. Notably, the model achieved a rank-1 accuracy of 95.0% on CASIA-B under normal conditions and 87.1% on the expansive OU-MVLP dataset. The model's robustness is underscored by substantial improvements under challenging conditions such as bag-carrying (87.2%) and coat-wearing (70.4%).

Ablation Studies

The paper includes comprehensive ablation studies on CASIA-B, confirming the effectiveness of various design choices. For instance, employing the set perspective outperformed traditional GEI-based methods significantly, and utilizing independent weights in HPM consistently improved accuracy. The attention mechanism in SP, while slightly more complex, offers a nuanced boost in aggregating discriminative features.

Practical Implications

The proposed method exhibits considerable potential for real-world applications. Its ability to handle diverse sequences, even with incomplete or cross-view data, enhances its utility in non-cooperative scenarios, such as surveillance and security contexts. The flexibility of input configurations presents opportunities for integration into systems requiring robust biometric verification under varying external conditions.

Future Directions

Future research may explore optimized Set Pooling strategies and extend the applicability to more complex and variable environments. The promising results on large-scale datasets suggest a trajectory towards scalable and adaptable gait recognition systems, potentially integrating with broader multi-modal biometric solutions.

In summary, GaitSet marks a significant advancement in gait recognition, providing a robust, flexible, and computationally efficient framework that broadens the scope of biometric identification technologies.

PDF Markdown

Related Papers

GitHub

GitHub - AbnerHqC/GaitSet: A flexible, effective and fast cross-view gait recognition network (573 stars)