Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation (2011.01461v2)

Published 3 Nov 2020 in cs.CV

Abstract: Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans. However, the representations based on global information often neglect the details of the gait frame, while local region based descriptors cannot capture the relations among neighboring regions, thus reducing their discriminativeness. In this paper, we propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition. Towards this goal, we take advantage of both global visual information and local region details and develop a Global and Local Feature Extractor (GLFE). Specifically, our GLFE module is composed of our newly designed multiple global and local convolutional layers (GLConv) to ensemble global and local features in a principle manner. Furthermore, we present a novel operation, namely Local Temporal Aggregation (LTA), to further preserve the spatial information by reducing the temporal resolution to obtain higher spatial resolution. With the help of our GLFE and LTA, our method significantly improves the discriminativeness of our visual features, thus improving the gait recognition performance. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art gait recognition methods on two popular datasets.

Citations (191)

View on Semantic Scholar

Summary

The paper introduces a new framework employing GLFE and LTA to effectively combine global context and local details for gait recognition.
It utilizes GLConv layers and GeM pooling to capture high-resolution spatial and temporal features for improved accuracy in challenging scenarios.
The methodology achieves significant Rank-1 accuracy gains on CASIA-B and OUMVLP datasets, with robust performance under clothing and carrying variations.

Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation

The paper "Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation" presents a comprehensive paper on improving gait recognition through innovative feature extraction strategies. The authors focus on addressing the limitations of global and local feature extraction methods in gait recognition, aiming for more discriminative feature representations. With the introduction of the Global and Local Feature Extractor (GLFE) and Local Temporal Aggregation (LTA), the researchers formulate a novel framework that combines both global context and local detail, significantly enhancing recognition capabilities.

Methodology and Contributions

The proposed methodology leverages convolutional neural networks (CNNs) to construct gait features by incorporating two key elements: GLFE and LTA. The GLFE module uniquely integrates global and local convolutional layers, designated as GLConv layers, which ensemble features by both capturing whole frame appearances and detailed local regions. This approach surpasses traditional gait feature extraction methods that focus solely on either global or local aspects, and enhances discriminativeness by maintaining spatial integrity during feature fusion.

Additionally, the LTA operation replaces conventional spatial pooling layers to preserve spatial fidelity by aggregating temporal information over sequential clips, thus achieving higher spatial resolution for features. This preservation of spatial data provides a more robust feature map, which is vital for handling challenging recognition scenarios where subjects may be wearing various clothes or carrying objects.

Another significant contribution is the utilization of the Generalized-Mean pooling (GeM) layer, which improves adaptive spatial feature aggregation, allowing for the robust compilation of spatial features without predefined trade-off parameters.

Results and Evaluation

The authors present extensive evaluations conducted on two prevalent datasets: CASIA-B and OUMVLP. The outcomes demonstrate superior performance of the proposed approach, successfully outperforming current state-of-the-art gait recognition methods. Specifically, the method achieves significant accuracy improvements under diverse conditions and angle variations, emphasizing its effectiveness in real-world deployment scenarios.

Detailed results indicate substantial gains in Rank-1 accuracy, especially on challenging conditions like clothing changes (CL) and carrying variations (BG). The framework shows robustness across different view angles and scales of training datasets.

Implications and Future Research Directions

The findings from this research contribute valuable insights into biometric technology advancements, particularly within the domain of gait recognition. The framework's ability to integrate both global context and localized detail positions it as a versatile solution for applications in surveillance, security, and intelligent transportation systems.

Future work may explore further optimization of temporal aggregation schemes or integration with other biometric modalities for enhanced multi-feature recognition systems. Moreover, scalable solutions that can handle larger datasets efficiently, while maintaining high accuracy, are promising directions for continued research.

In summary, this paper introduces a substantial advancement in gait recognition methodologies, driving further exploration into hybrid feature representation systems that balance global and local attributes effectively.

PDF Markdown