Learning Context Graph for Person Search (1904.01830v1)

Published 3 Apr 2019 in cs.CV

Abstract: Person re-identification has achieved great progress with deep convolutional neural networks. However, most previous methods focus on learning individual appearance feature embedding, and it is hard for the models to handle difficult situations with different illumination, large pose variance and occlusion. In this work, we take a step further and consider employing context information for person search. For a probe-gallery pair, we first propose a contextual instance expansion module, which employs a relative attention module to search and filter useful context information in the scene. We also build a graph learning framework to effectively employ context pairs to update target similarity. These two modules are built on top of a joint detection and instance feature learning framework, which improves the discriminativeness of the learned features. The proposed framework achieves state-of-the-art performance on two widely used person search datasets.

Citations (181)

View on Semantic Scholar

Summary

The paper presents a novel framework that fuses contextual instance expansion with relative attention to overcome re-id challenges.
The methodology employs graph convolution networks to embed context and update similarity, yielding state-of-the-art results on CUHK-SYSU and PRW datasets.
The findings imply that integrating global and local context enhances detection and re-identification in complex surveillance scenarios.

Learning Context Graph for Person Search

In the field of computer vision, person re-identification (re-id) has been a pivotal domain of research due to its extensive applications in surveillance systems, public security, and safety. Despite marked progress achieved through deep convolutional neural networks (CNNs), challenges such as varying illumination, pose variations, and occlusion persist. This paper introduces a novel framework that emphasizes the utilization of context information for enhancing person search capabilities, advancing beyond traditional individual appearance feature embeddings.

The proposed framework is built on a joint detection and feature learning foundation, akin to state-of-the-art methods that integrate pedestrian detection and re-identification tasks. Nevertheless, it introduces significant innovations to tackle persistent challenges in person re-id. The framework comprises three primary components: contextual instance expansion, graph learning representation, and a relative attention mechanism specifically designed to emphasize useful context cues over noisy information.

Contextual Instance Expansion

The paper introduces a contextual instance expansion module, leveraging a relative attention mechanism to filter and extract pertinent context information from scene co-travelers’ appearances. This element of the framework addresses the deficiency of individual features for person identification by incorporating contextual cues. Under this expanded approach, the framework evaluates context candidates in a probe-gallery pair, selecting top matches through relative attention to consider as informative contexts.

Contextual Graph Representation Learning

To structurally embed context and update target similarity, the framework delineates a graph learning paradigm. This graph comprises nodes representing target pairs and selected context pairs, utilizing graph convolutional networks (GCNs) for learning and updating the visual relations captured in the graph structure. Central to this component is the construction and training of a context graph that integrates both global similarity and local contextual details for enhanced person search efficacy.

Evaluation and Results

The proposed framework's efficacy has been validated using two prominent datasets: CUHK-SYSU and PRW. On CUHK-SYSU, the framework demonstrated significant improvements with rank-1 matching accuracy and mean Average Precision (mAP) over existing methods. Similarly, the performance improvements on PRW further attest to the utility of contextual learning in person re-id tasks. Strong numerical results underscore the framework's proficiency in leveraging contextual cues to enhance discrimination and robustness in person search operations, achieving state-of-the-art performance on both datasets.

Implications and Future Directions

The introduction of context-based learning strategies presents theoretical and practical implications. Theoretically, it enriches the feature representation by exploiting spatial-temporal relationships and co-traveler information, offering a pathway to tackle high intra-class variations and appearance changes comprehensively. Practically, the framework's end-to-end capacity for detection and re-identification holds potential for widespread deployment in diverse surveillance scenarios, promising minimal reliance on offline pedestrian detectors.

Future trajectories might explore more sophisticated GCN architectures or alternative graph-based frameworks to capture even finer-grained relationships within scene contexts. Further investigations could extend to broader dynamics, such as transient pedestrian movements, to fortify time-sensitive person re-id systems. Given the promising results, this contextual approach may well inspire a shift towards viewing identification as a holistic scene understanding task rather than an isolated feature task in artificial intelligence research.

PDF Markdown