Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting (1904.10424v4)

Published 23 Apr 2019 in cs.CV

Abstract: For person re-identification, existing deep networks often focus on representation learning. However, without transfer learning, the learned model is fixed as is, which is not adaptable for handling various unseen scenarios. In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. We treat image matching as finding local correspondences in feature maps, and construct query-adaptive convolution kernels on the fly to achieve local matching. In this way, the matching process and results are interpretable, and this explicit matching is more generalizable than representation features to unseen scenarios, such as unknown misalignments, pose or viewpoint changes. To facilitate end-to-end training of this architecture, we further build a class memory module to cache feature maps of the most recent samples of each class, so as to compute image matching losses for metric learning. Through direct cross-dataset evaluation, the proposed Query-Adaptive Convolution (QAConv) method gains large improvements over popular learning methods (about 10%+ mAP), and achieves comparable results to many transfer learning methods. Besides, a model-free temporal cooccurrence based score weighting method called TLift is proposed, which improves the performance to a further extent, achieving state-of-the-art results in cross-dataset person re-identification. Code is available at https://github.com/ShengcaiLiao/QAConv.

Citations (8)

View on Semantic Scholar

Summary

The paper introduces QAConv and TLift, which directly match local features for robust, interpretable person re-identification.
It leverages query-adaptive convolution to handle pose variations and misalignments efficiently without transfer learning.
Empirical evaluations demonstrate over a 10% mAP improvement, achieving state-of-the-art performance across diverse re-identification datasets.

Interpretable and Generalizable Person Re-Identification with QAConv and TLift

The paper presents a new approach for person re-identification that focuses on improving generalization and interpretability across different datasets without transfer learning. The research, undertaken by Shengcai Liao and Ling Shao, introduces the Query-Adaptive Convolution (QAConv) method for person image matching and the Temporal Lifting method (TLift) for post-processing.

Core Contributions

The innovative aspect of this work lies in the method it employs for person re-identification. Rather than relying solely on fixed representation vectors, QAConv directly handles deep feature maps to identify local correspondences between images. Furthermore, QAConv constructs adaptive convolution kernels on the fly from the query image, allowing for local convolution and max pooling on gallery images. This explicit matching strategy is adaptable to changes in pose or viewpoint and can address issues like image misalignment, providing improved interpretability over previous methods.

Another significant contribution is the class memory module, which caches recent sample feature maps for end-to-end training, enabling the computation of image matching losses for metric learning without further transfer learning. This paves the way for direct cross-dataset evaluations and enhanced generalization, achieving substantial improvements over traditional representation learning methods.

The paper also introduces TLift, a model-free score weighting method based on temporal cooccurrence, which boosts performance by leveraging spatial-temporal structure. TLift does not depend on statistical model training and can be conducted on the fly, which adds efficiency without compromising accuracy.

Numerical Results

Empirical results demonstrate that QAConv yields significant improvements, reporting an increase of over 10% in mAP during cross-dataset evaluations over existing popular learning methods. Specifically, when coupled with TLift, the performance on cross-dataset re-identification tasks achieves state-of-the-art results, showing around 62.8% Rank-1 accuracy and 31.6% mAP when trained on DukeMTMC-reID and tested on Market-1501.

The paper validates these gains through comprehensive experiments across major person re-identification datasets including Market-1501, DukeMTMC-reID, CUHK03, and MSMT17. The approach is particularly successful when using larger, diverse data sets like MSMT17, showcasing QAConv's robustness and generalization capabilities.

Implications and Future Directions

The research points to several theoretical and practical implications. Theoretically, it challenges the reliance on fixed representation vectors for image matching, suggesting local correspondence as a robust alternative. Practically, the significant reduction in computational cost for cross-dataset evaluations means that person re-identification systems could be more realistically deployed in resource-constrained environments.

The work also opens avenues for further exploration, such as integrating QAConv with domain adaptation techniques to refine transfer learning models or developing more advanced techniques for obtaining local correspondences. Additionally, given the model-free nature of TLift, exploring its applicability in other applications could further validate its effectiveness.

In summary, by focusing on query-adaptive frameworks and model-free temporal contributions, this research provides a refined perspective on enhancing the accuracy, efficiency, and applicability of person re-identification systems in varied environments while establishing a baseline model that could improve existing domain adaptation approaches.

PDF Markdown

Related Papers

GitHub

GitHub - ShengcaiLiao/QAConv: [ECCV 2020] QAConv: Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting, and [CVPR 2022] GS: Graph Sampling Based Deep Metric Learning (202 stars)