Person Transfer GAN to Bridge Domain Gap for Person Re-Identification (1711.08565v2)

Published 23 Nov 2017 in cs.CV

Abstract: Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e.g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network. To facilitate the research towards conquering those issues, this paper contributes a new dataset called MSMT17 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, and 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe that, domain gap commonly exists between datasets, which essentially causes severe performance drop when training and testing on different datasets. This results in that available training data cannot be effectively leveraged for new testing domains. To relieve the expensive costs of annotating new training samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to bridge the domain gap. Comprehensive experiments show that the domain gap could be substantially narrowed-down by the PTGAN.

Citations (1,562)

View on Semantic Scholar

Summary

The paper introduces PTGAN, a GAN-based solution that effectively bridges domain gaps by transferring style while preserving person identity.
It presents MSMT17, the largest and most complex ReID dataset, with 126,441 bounding boxes and 4,101 identities simulating realistic surveillance conditions.
Experimental results show significant improvements in Rank-1 accuracy across datasets, demonstrating PTGAN’s potential to reduce labeling costs in weakly supervised settings.

Bridging Domain Gap for Person Re-Identification Using PTGAN

The paper "Person Transfer GAN to Bridge Domain Gap for Person Re-Identification" by Longhui Wei et al. introduces a novel approach to addressing the significant challenges in person re-identification (ReID) posed by domain gaps between different datasets. This work contributes both a comprehensive dataset, MSMT17, designed to better simulate real-world scenarios and a generative adversarial network (GAN)-based model, termed Person Transfer GAN (PTGAN), aimed at mitigating cross-domain discrepancies.

Contributions of MSMT17

MSMT17 stands out as an innovative dataset that improves on existing datasets in several ways:

Scale and Complexity: MSMT17 comprises an extensive collection of 126,441 bounding boxes across 4,101 identities, captured by a 15-camera network spanning both indoor and outdoor environments. This is currently the largest and most complex dataset available for person ReID research.
Diverse Lighting Conditions: The dataset incorporates substantial lighting variations by including video footage taken at different times of the day across multiple days, mimicking the dynamic illumination conditions encountered in realistic surveillance settings.
Robust Bounding Box Detection: Leveraging Faster RCNN for bounding box detection ensures higher accuracy and reliability compared to older methods like DPM or manual annotation.

Addressing Domain Gaps with PTGAN

A core challenge in person ReID is the presence of domain gaps between datasets, causing severe drops in performance when models trained on one dataset are tested on another. This issue arises due to variations in factors such as lighting, background, camera resolution, and environmental conditions. The PTGAN model proposed by the authors offers an effective solution to this problem through an innovative approach:

Style Transfer and Identity Preservation: PTGAN is designed to achieve two primary goals. First, it must effectively translate the stylistic attributes of one dataset to another, ensuring that transferred images reflect the target domain's conditions. Second, it must preserve the identity-related features of individuals to maintain the integrity of ReID tasks.
Cycle Consistency with Identity Loss: Rooted in the Cycle-GAN framework, PTGAN incorporates additional constraints to enforce the stability of person identities during style transfer. This is achieved by computing identity losses on the foreground regions of person images, which helps maintain crucial identification cues.

Experimental Validation

Performance on MSMT17

Experiments conducted on MSMT17 revealed the substantial challenges posed by the dataset. Methods such as GLAD and PDC, which demonstrate strong performance on other datasets, recorded significantly lower accuracy and mAP on MSMT17, indicating its complexity and realism.

Bridging Gaps with PTGAN

The effectiveness of PTGAN was validated through a series of experiments:

Transfer to Small Datasets: By transferring training sets from larger datasets (CUHK03 and Market-1501) to the smaller PRID dataset, PTGAN significantly boosted the Rank-1 accuracy (e.g., from 2.0% to 37.5% for CUHK03 transferred to PRID-cam1). The combined transferred datasets further enhanced the performance, underscoring the model's ability to handle multi-style transfers effectively.
Transfer Among Large Datasets: When applied to large datasets, PTGAN demonstrated a consistent reduction in domain gaps. For instance, transferring from Duke- and Market-1501 to MSMT17 resulted in improved ReID performance, even when trained on the target's original training sets.
Weakly Supervised Learning: The addition of transferred data to a subset of MSMT17's training set showcased PTGAN's potential in reducing labeling costs. The combined datasets yielded notable improvements in ReID accuracy, reinforcing the method's practical applicability in resource-constrained scenarios.

Implications and Future Directions

The growing complexity of surveillance environments necessitates robust ReID systems capable of performing under diverse and dynamic conditions. The contributions of MSMT17 provide a challenging benchmark to drive advancements in this domain. Meanwhile, PTGAN's approach to mitigating domain gaps through generative models presents a scalable and practical solution for leveraging vast existing datasets without demanding extensive new annotations.

Future research directions include refining the PTGAN methodology to better handle multi-camera setups with varying styles within large datasets. Further exploration into optimizing transfer strategies could lead to even greater performance gains and broader applicability in real-world surveillance systems.

This paper presents significant strides in addressing key challenges in person ReID, demonstrating the utility of large-scale, realistic datasets and innovative GAN-based solutions in advancing the field.