A Multi-task Deep Network for Person Re-identification (1607.05369v3)

Published 19 Jul 2016 in cs.CV

Abstract: Person re-identification (ReID) focuses on identifying people across different scenes in video surveillance, which is usually formulated as a binary classification task or a ranking task in current person ReID approaches. In this paper, we take both tasks into account and propose a multi-task deep network (MTDnet) that makes use of their own advantages and jointly optimize the two tasks simultaneously for person ReID. To the best of our knowledge, we are the first to integrate both tasks in one network to solve the person ReID. We show that our proposed architecture significantly boosts the performance. Furthermore, deep architecture in general requires a sufficient dataset for training, which is usually not met in person ReID. To cope with this situation, we further extend the MTDnet and propose a cross-domain architecture that is capable of using an auxiliary set to assist training on small target sets. In the experiments, our approach outperforms most of existing person ReID algorithms on representative datasets including CUHK03, CUHK01, VIPeR, iLIDS and PRID2011, which clearly demonstrates the effectiveness of the proposed approach.

Citations (193)

View on Semantic Scholar

Summary

The paper introduces a novel multi-task deep network combining binary classification and ranking to effectively boost person re-identification performance.
It leverages shared and task-specific layers to capture both global and detailed features, outperforming state-of-the-art methods.
The integrated cross-domain framework utilizes auxiliary datasets to enhance model robustness and improve accuracy on smaller datasets.

Multi-task Deep Network for Enhanced Person Re-identification

The paper "A Multi-task Deep Network for Person Re-identification," authored by Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang, introduces an innovative approach to the problem of person re-identification (ReID) in wide area surveillance systems. This work addresses the challenges inherent in person ReID, such as significant variations in human appearance due to diverse poses, illumination, and camera perspectives. By combining binary classification and ranking tasks in a unified network framework, the authors present a novel multi-task deep network (MTDnet) that shows marked improvements in ReID performance compared to existing methods.

Methodology

The uniqueness of MTDnet lies in its dual-task integration. Traditional ReID approaches often treat identification either as a binary classification or a ranking problem. Disjoint optimization of these tasks can lead to suboptimal performance. Therefore, MTDnet concurrently optimizes both tasks within a single architecture, where the binary classification loss is applied to the determination of image pairs belonging to the same individual, while the ranking loss refines the ordering of images based on similarities and differences.

The conjoint use of these tasks is realized through shared and task-specific network layers. Initial shared layers capture global features conducive to both tasks, while later layers are tailored for task-specific feature learning — semantic features for classification and fine-grained relational features for ranking. The authors substantiate that this combination substantially enhances the capacity of the model to learn discriminative and distinctive representations necessary for effective ReID.

Cross-domain Transfer for Small Datasets

A critical limitation in ReID is the insufficient volume of available labeled data in many datasets. To address this, the authors extend their architecture with a cross-domain framework enabling the use of auxiliary datasets to bolster training on limited target datasets. This semi-supervised method employs a contrastive loss to minimize the domain gap through alignment of joint feature maps from different datasets, thus leveraging auxiliary sets to refine the model's representational capabilities across domains.

Experimental Results

The empirical results substantiate the superiority of MTDnet across multiple benchmark datasets, including CUHK03, CUHK01, VIPeR, iLIDS, and PRID2011. The multi-task framework significantly outperforms both traditional and deep learning-based state-of-the-art ReID methods by achieving higher rank-1 accuracies. Notably, MTDnet surpasses existing algorithms on larger datasets such as CUHK03, demonstrating its scalability and efficiency.

By comparing single-task networks (MTDnet-cls and MTDnet-rnk) with the integrated MTDnet, the paper illustrates the marked improvement attained through joint optimization. Additionally, the cross-domain architecture further enhances performance on smaller datasets, demonstrating its practical applicability in environments with constrained data resources.

Implications and Future Work

This research introduces a comprehensive framework that merges classification and ranking tasks, offering a richer learning paradigm that could be adapted for other domains facing similar feature representational challenges. The ability to utilize auxiliary datasets effectively will likely inspire further exploration into domain adaptation techniques within the ReID and wider machine learning community.

Future research could investigate more sophisticated domain adaptation strategies, potentially incorporating advanced techniques from adversarial learning or graph-based models to refine cross-domain learning. Additionally, as surveillance environments continue to evolve, the adaptability of such multi-task frameworks for real-time, scalable implementations will be crucial.

In summary, the multi-task deep network proposed by Chen et al. represents a significant step forward in the advancement of person re-identification methodologies, presenting a robust framework that adeptly balances multiple tasks to yield improved identification accuracy across constrained and varied datasets.