Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification (1803.09937v1)

Published 27 Mar 2018 in cs.CV

Abstract: Typical person re-identification (ReID) methods usually describe each pedestrian with a single feature vector and match them in a task-specific metric space. However, the methods based on a single feature vector are not sufficient enough to overcome visual ambiguity, which frequently occurs in real scenario. In this paper, we propose a novel end-to-end trainable framework, called Dual ATtention Matching network (DuATM), to learn context-aware feature sequences and perform attentive sequence comparison simultaneously. The core component of our DuATM framework is a dual attention mechanism, in which both intra-sequence and inter-sequence attention strategies are used for feature refinement and feature-pair alignment, respectively. Thus, detailed visual cues contained in the intermediate feature sequences can be automatically exploited and properly compared. We train the proposed DuATM network as a siamese network via a triplet loss assisted with a de-correlation loss and a cross-entropy loss. We conduct extensive experiments on both image and video based ReID benchmark datasets. Experimental results demonstrate the significant advantages of our approach compared to the state-of-the-art methods.

Citations (455)

View on Semantic Scholar

Summary

The paper introduces DuATM, a framework that uses dual attention mechanisms to refine and align context-aware feature sequences for improved person re-identification.
DuATM employs DenseNet-based feature extraction with a bidirectional recurrent layer to capture both spatial and temporal cues from image and video inputs.
Experimental results on benchmarks like Market-1501 show that DuATM outperforms state-of-the-art methods, achieving a rank-1 accuracy of over 91%.

Dual Attention Matching Network for Context-Aware Feature Sequence-based Person Re-Identification

The paper introduces a novel framework named Dual ATtention Matching network (DuATM) aimed at enhancing person Re-Identification (ReID). Traditional ReID methods rely on representing each pedestrian with a single feature vector, which often proves inadequate due to visual ambiguities that arise from similar appearances or drastic appearance changes. The DuATM framework addresses these limitations by leveraging context-aware feature sequences and dual attention mechanisms.

Methodology

DuATM consists of two main components:

Feature Sequence Extraction Module: Built on DenseNet-121, this module extracts feature sequences that encompass both image and video inputs. For images, spatial context-aware features are produced from convolutional maps. Videos benefit from a bidirectional recurrent layer, which encodes temporal-spatial details and motion cues.
Sequence Matching Module: The core innovation of DuATM lies in its dual attention mechanism. It uses intra-sequence attention for refining corrupted feature vectors by exploiting contextual information within a sequence. Inter-sequence attention aligns feature pairs by selecting semantically consistent counterparts across sequences. This dual attention approach allows for effective comparison between potentially unaligned and corrupted feature sequences.

Training Strategy

DuATM is trained as a siamese network utilizing a triplet loss, augmented by a de-correlation loss and a cross-entropy loss to enhance the discriminative power and compactness of the learned features. The triplet loss ensures that positive pairs have smaller distances than negative pairs.

Experimental Results

Extensive experiments conducted on datasets such as Market-1501, DukeMTMC-reID, and MARS illustrate the efficacy of DuATM. Results demonstrate superior performance compared to baseline and state-of-the-art methods. Notably, DuATM achieves a rank-1 accuracy of 91.42% on the Market-1501 dataset, surpassing existing techniques.

Technical Insights

The dual attention mechanism is a significant aspect of DuATM, allowing for precise refinement and alignment of feature sequences. This methodology addresses issues related to misalignment and feature corruption, common in previous approaches relying on heuristic correspondence structures. By focusing on both local and spatial-temporal patterns, DuATM offers robust and reliable ReID capabilities.

Implications and Future Directions

The introduction of DuATM offers both practical and theoretical advancements in ReID. Practically, it provides a more reliable method for surveillance and retrieval tasks. Theoretically, it opens avenues for further exploration into multi-modal attention mechanisms and their applications across various computer vision challenges. Future developments may explore extending this framework to incorporate additional data modalities or refining the dual attention mechanisms to further enhance alignment and refinement capabilities.

In conclusion, DuATM represents a substantial contribution to the area of person ReID by overcoming limitations of existing vector-based approaches and introducing an innovative dual attention system for improved sequence-based matching.

PDF Markdown