Adaptive Graph Representation Learning for Video Person Re-identification (1909.02240v2)

Published 5 Sep 2019 in cs.CV

Abstract: Recent years have witnessed the remarkable progress of applying deep learning models in video person re-identification (Re-ID). A key factor for video person Re-ID is to effectively construct discriminative and robust video feature representations for many complicated situations. Part-based approaches employ spatial and temporal attention to extract representative local features. While correlations between parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between relevant regional features. Specifically, we exploit the pose alignment connection and the feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine regional features iteratively, and the neighbor nodes' information is taken into account for part feature representation. To learn compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method. The code is available at https://github.com/weleen/AGRL.pytorch.

Authors (6)

Yiming Wu (31 papers)
Omar El Farouk Bourahla (4 papers)
Xi Li (198 papers)
Fei Wu (317 papers)
Qi Tian (314 papers)
Xue Zhou (17 papers)

Citations (107)

View on Semantic Scholar

Summary

Adaptive Graph Representation Learning for Video Person Re-identification

The paper entitled "Adaptive Graph Representation Learning for Video Person Re-identification" investigates an innovative approach utilizing graph neural networks (GNNs) to enhance video-based person re-identification (Re-ID). Video person Re-ID seeks to correctly identify individuals across different video frames, challenging due to factors like occlusion, background clutter, viewpoint variation, illumination changes, and pose changes.

Recent advancements in deep learning have significantly improved video-based person Re-ID using spatial and temporal attention mechanisms. However, the current methods often overlook the intrinsic relationships between different body parts, which could provide robustness against complex scenarios such as occlusion or varying poses. This paper proposes an adaptive graph representation learning technique, aiming to harness these relationships effectively for enriched video feature extraction.

Methodology

The core of this approach is the construction of an adaptive structure-aware adjacency graph which encapsulates spatial relationships among body parts and their contextual interactions across frames. Two types of connections are considered for building this graph:

Pose Alignment Connection: Regions containing the same human part are connected across frames, helping align spatial regions using human pose estimation techniques.
Feature Affinity Connection: This is achieved by evaluating the affinity between extracted regional features, allowing dynamic capturing of visual semantic relationships.

By merging these connections, the method constructs a robust graph architecture to model regional interactions. Through graph feature propagation, adjustments to regional features iteratively incorporate information from adjacent nodes, improving discrimination capabilities. Additionally, the method introduces temporal resolution-aware regularization to ensure consistency across varying temporal resolutions, enabling more compact and discriminative video representations.

Experimental Evaluation

The proposed approach is rigorously evaluated on four prominent benchmarks: iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID. Results show that this method achieves competitive performance, evidenced by improved CMC and mAP scores across datasets. Notably, Rank-1 accuracy reached improved levels on complex datasets such as MARS and DukeMTMC-VideoReID, showcasing the effectiveness in robust identity retrieval under demanding scenarios.

Implications and Future Directions

Practically, the research proposes a framework significantly enhancing performance in video surveillance systems and automated video analysis tools through reliable identity retrieval. Theoretically, it suggests a new paradigm in graph-based representation learning, encouraging further exploration into adaptive graph constructions and improved contextual relation modeling for other computer vision applications.

Future research could explore the extension of this framework to incorporate more complex graph learning paradigms or hybrid models integrating traditional deep learning architectures with graph-based approaches. Moreover, addressing computational efficiency for real-time applications remains an essential area for development.

This paper sets a foundation for exploring graph-based methods in video person Re-ID, providing promising results that could inspire further innovations in the domain of intelligent surveillance and video analysis.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - weleen/AGRL.pytorch: [TIP2020] Adaptive Graph Representation Learning for Video Person Re-identification (43 stars)