Deep Learning for Person Re-identification: A Survey and Outlook (2001.04193v2)

Published 13 Jan 2020 in cs.CV

Abstract: Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

View on arXiv

Authors (6)

Mang Ye (43 papers)
Jianbing Shen (96 papers)
Gaojie Lin (9 papers)
Tao Xiang (324 papers)
Ling Shao (244 papers)
Steven C. H. Hoi (94 papers)

Citations (1,377)

View on Semantic Scholar

Summary

Deep Learning for Person Re-identification: A Survey and Outlook

The paper "Deep Learning for Person Re-identification: A Survey and Outlook" by Mang Ye et al. extensively reviews the progression and current state of person re-identification (Re-ID), categorizing major research efforts into both closed-world and open-world settings. The advent of deep learning techniques has considerably fostered advancements in Re-ID, driven partly by the increasing demand for intelligent surveillance technologies.

Closed-World Person Re-Identification

Closed-world Re-ID operates under several assumptions: single-modality visual data, reliance on independently generated bounding boxes, ample annotated data, correct and full human body annotations, and a mandatory presence of the query person in the gallery. With these controlled conditions, Re-ID research focuses on three main aspects: feature representation learning, deep metric learning, and ranking optimization.

Feature Representation Learning

Feature learning strategies include global features, local part-based features, auxiliary features such as attributes or generated data using GANs, and video features for video-based Re-ID. Notable methods like Part-based Convolutional Baseline (PCB) and Multi-Scale Context-Aware Feature Learning exemplify the benefits of part-level and multi-scale representations.

Deep Metric Learning

Metric learning for Re-ID involves loss functions like identity loss, verification loss, triplet loss, and their variants which are critical for discriminative feature learning. State-of-the-art models often leverage a combination of these losses to enhance performance. Methods designed around hard sample mining and weighting in loss calculations are particularly effective.

Ranking Optimization

Optimizing the ranking list is fundamental for practical Re-ID applications. Re-ranking methods utilize gallery-to-gallery relationships and query-specific adaptations, while rank fusion combines results from multiple retrieval methods to enhance final performance metrics. Such tools aim to refine the initial retrieval results to ensure lower false matches and higher true positive rates.

Open-World Person Re-Identification

The research focus in recent years has shifted to the more challenging open-world Re-ID, which addresses practical deployment scenarios where assumptions of the closed-world setting do not hold.

Heterogeneous Re-ID

This problem involves matching across different modalities such as RGB and infrared images, or text descriptions. Innovative solutions include leveraging GANs for cross-modality image generation and designing shared feature spaces for different data types.

End-to-End Re-ID

An integrated approach that combines object detection and Re-ID in a single framework is key to reducing the dependency on independent bounding box generation. End-to-end systems have shown promises but remain challenging due to the dual-focus nature of detection and identification tasks.

Semi-Supervised and Unsupervised Learning

Unsupervised domain adaptation and semi-supervised learning methods aim to minimize the reliance on annotated data. These methods seek to transfer knowledge from labeled source domains to unlabeled target domains, utilizing techniques such as iterative clustering and generative adversarial networks.

Noise-Robust Re-ID

Addressing partial occlusions and sample or label noise is critical for real-world applications. Techniques that incorporate human body part detection systems, attention mechanisms, and robust feature extraction methods confront these challenges effectively.

Open-Set Re-ID and Beyond

Open-set Re-ID entails identifying scenarios where the query may not exist in the gallery. This requires advanced verification methods to maintain high true positive rates and low false positives. Moreover, adapting Re-ID models to dynamic environments and designing group Re-ID systems represent ongoing challenges.

Implications and Future Directions

The implications of this research extend both practically and theoretically:

Practical Implications: Enhanced security and surveillance systems, improved forensic investigations, and better user experiences in smart city applications.
Theoretical Implications: Refined algorithms for feature extraction, improved understanding of metric learning, and the development of robust and adaptable AI systems.

mINP: A New Evaluation Metric

The authors propose mINP, a metric assessing the cost required to retrieve all correct matches, adding a new dimension to the evaluation of Re-ID systems. This metric complements the existing CMC and mAP by focusing on the effort needed to find harder matches.

AGW Baseline Development

A structured AGW baseline integrates several state-of-the-art strategies: non-local attention, generalized-mean pooling, and weighted regularized triplet loss. This model demonstrates superior performance across multiple datasets, including those under single-modality, cross-modality, and partial Re-ID conditions, providing a robust baseline for future research.

Conclusion

The survey and outlook articulated in this paper present a comprehensive view of the current state and future trends in person Re-Identification using deep learning. By dissecting the progress made in closed-world settings and the burgeoning challenges of open-world deployments, the authors set the stage for ongoing innovations and refinements in Re-ID. The contributions, notably the mINP metric and AGW baseline, offer promising directions for future research and deployment in practical Re-ID systems.

PDF Markdown

Related Papers

Find Related Papers