Deep Ranking for Person Re-identification via Joint Representation Learning (1505.06821v2)

Published 26 May 2015 in cs.CV

Abstract: This paper proposes a novel approach to person re-identification, a fundamental task in distributed multi-camera surveillance systems. Although a variety of powerful algorithms have been presented in the past few years, most of them usually focus on designing hand-crafted features and learning metrics either individually or sequentially. Different from previous works, we formulate a unified deep ranking framework that jointly tackles both of these key components to maximize their strengths. We start from the principle that the correct match of the probe image should be positioned in the top rank within the whole gallery set. An effective learning-to-rank algorithm is proposed to minimize the cost corresponding to the ranking disorders of the gallery. The ranking model is solved with a deep convolutional neural network (CNN) that builds the relation between input image pairs and their similarity scores through joint representation learning directly from raw image pixels. The proposed framework allows us to get rid of feature engineering and does not rely on any assumption. An extensive comparative evaluation is given, demonstrating that our approach significantly outperforms all state-of-the-art approaches, including both traditional and CNN-based methods on the challenging VIPeR, CUHK-01 and CAVIAR4REID datasets. Additionally, our approach has better ability to generalize across datasets without fine-tuning.

Authors (3)

Shi-Zhe Chen (1 paper)
Chun-Chao Guo (1 paper)
Jian-Huang Lai (35 papers)

Citations (215)

View on Semantic Scholar

Summary

Deep Ranking for Person Re-identification via Joint Representation Learning

The paper presents a novel deep learning approach for the task of person re-identification in multi-camera surveillance systems, articulated as a deep ranking problem. With traditional approaches often focusing on discrete or sequential design of features and metric learning, this work differentiates itself by integrating these aspects into a unified framework that leverages deep convolutional neural networks (CNNs) for joint representation learning.

At the core of this paper's contribution is the proposal of a deep ranking algorithm that simultaneously learns feature representations and similarity measures from raw image pixels. The ranking framework is structured around the principle that a probe image's correct match should achieve the top rank within a gallery of candidates. This problem is addressed with a novel learning-to-rank algorithm which minimizes a cost function related to disordered rankings. The approach employs a CNN to correlate image pairs with their similarity scores inherently through joint representation, eliminating reliance on engineered features or preconceived models.

The CNN architecture, adapted from AlexNet, capitalizes on its successful application in image classification tasks and is specifically tweaked to manage paired pedestrian images to ultimately produce a similarity score. The model is pretrained using outside datasets to bolster the training capacity afforded by the typically limited datasets available for person re-identification, succeeding in effectively generalizing across diverse datasets as demonstrated in experiments.

The paper provides a rigorous evaluation of the framework, with extensively comparative analyses against state-of-the-art traditional and deep learning methods on well-known datasets such as VIPeR, CUHK-01, and CAVIAR4REID. The experiments indicate a consistent outperforming across metrics and rank positions. Notably, the proposed framework yielded a rank-1 accuracy of 38.37% on the VIPeR dataset, surpassing previous best results by a measurable margin. Additionally, one of the key insights from the paper is the system's ability to generalize across datasets without requiring fine-tuning, a property achieved due to its robust feature representation.

In practical terms, this framework addresses one of the crucial challenges in video surveillance—reliably re-identifying individuals as they move across non-overlapping camera views—by modeling the task as a ranking problem. The theoretical implications stem from this paradigm shift to a ranking perspective, proposing joint representation as an effective alternative to traditional feature extraction and metric learning.

Looking forward, the proposed method implies potential for other ranking-based visual tasks and may serve as a cornerstone for further investigations into integrating deep learning for robust identity verification. Exploring the adaptation of this framework to handle video input, thereby enhancing the temporal aspect of recognition, constitutes a natural development direction.

In conclusion, the paper contributes a significant methodological advancement in person re-identification, merging deep learning's ability to learn representations with the ranking problem structure, achieving both theoretical novelty and practical superiority over existing approaches.

PDF Markdown

Deep Ranking for Person Re-identification via Joint Representation Learning (1505.06821v2)

Summary

Deep Ranking for Person Re-identification via Joint Representation Learning

Related Papers