Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (2103.04618v1)

Published 8 Mar 2021 in cs.CV

Abstract: This paper considers the problem of unsupervised person re-identification (re-ID), which aims to learn discriminative models with unlabeled data. One popular method is to obtain pseudo-label by clustering and use them to optimize the model. Although this kind of approach has shown promising accuracy, it is hampered by 1) noisy labels produced by clustering and 2) feature variations caused by camera shift. The former will lead to incorrect optimization and thus hinders the model accuracy. The latter will result in assigning the intra-class samples of different cameras to different pseudo-label, making the model sensitive to camera variations. In this paper, we propose a unified framework to solve both problems. Concretely, we propose a Dynamic and Symmetric Cross-Entropy loss (DSCE) to deal with noisy samples and a camera-aware meta-learning algorithm (MetaCam) to adapt camera shift. DSCE can alleviate the negative effects of noisy samples and accommodate the change of clusters after each clustering step. MetaCam simulates cross-camera constraint by splitting the training data into meta-train and meta-test based on camera IDs. With the interacted gradient from meta-train and meta-test, the model is enforced to learn camera-invariant features. Extensive experiments on three re-ID benchmarks show the effectiveness and the complementary of the proposed DSCE and MetaCam. Our method outperforms the state-of-the-art methods on both fully unsupervised re-ID and unsupervised domain adaptive re-ID.

Authors (7)

Fengxiang Yang (5 papers)
Zhun Zhong (60 papers)
Zhiming Luo (31 papers)
Yuanzheng Cai (7 papers)
Yaojin Lin (3 papers)
Shaozi Li (30 papers)
Nicu Sebe (271 papers)

Citations (104)

View on Semantic Scholar

Summary

The paper introduces the DSCE loss to dynamically mitigate noisy pseudo-labels and adapt to shifting feature representations.
It proposes MetaCam, a meta-learning strategy that simulates cross-camera conditions to learn robust, camera-invariant features.
Empirical evaluations on benchmarks like Market-1501 and MSMT-17 confirm significant performance improvements in unsupervised re-ID scenarios.

Unsupervised Person Re-Identification: Addressing Noisy Labels and Camera Shift

The paper "Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification" explores the unsupervised person re-identification (re-ID) problem, specifically focusing on learning effective discriminative models without labeled data. This area has substantial practical importance as annotated datasets can be expensive and labor-intensive to produce. A common approach in unsupervised re-ID involves clustering for pseudo-label generation, which can then optimize the model. However, this method faces challenges due to noisy labels from clustering and feature variations due to camera shifts. This paper innovatively addresses these issues by introducing a Dynamic and Symmetric Cross-Entropy loss (DSCE) and a camera-aware meta-learning strategy (MetaCam).

DSCE Loss for Robust Learning

The proposed DSCE loss is a significant contribution in addressing the problem of noisy labels. Noisy labels typically hinder model performance due to incorrect optimization. The DSCE loss, derived from ideas in learning with noisy labels (LNL) literature, dynamically adapts to changes in class centers after each clustering iteration using a feature memory. This adaptation is crucial for unsupervised settings where the class structure can vary significantly. The DSCE loss effectively mitigates the negative effects of these noisy samples by treating them symmetrically, ensuring that models are robust when pseudo-labels change. This strategy is empirically shown to enhance model performance on standard benchmarks.

MetaCam for Cross-Camera Adaptation

Camera shift, the variation in feature representation across different cameras, poses another challenge in unsupervised re-ID. Without addressing this shift, models might incorrectly separate intra-class samples or display sensitivity to camera variations. The paper proposes MetaCam, a meta-learning algorithm that explicitly simulates cross-camera conditions during training. By splitting the data into meta-train and meta-test sets, with distinctive camera identifiers, MetaCam allows the model to learn camera-invariant features. This approach significantly aligns the camera shift impact by validating the gradient updates against unseen camera conditions in the meta-test set.

Empirical Evaluation and Practical Implications

Empirical tests demonstrate the efficacy of DSCE and MetaCam across fully unsupervised and unsupervised domain adaptation (UDA) settings on re-ID benchmarks like Market-1501, DukeMTMC-reID, and MSMT-17. The model trained using this framework outperforms existing state-of-the-art methods, illustrating both the complementarity of DSCE and MetaCam and their robust joint application.

Practically, the ability to perform accurate person re-ID without labeled data opens pathways for deploying re-ID systems in real-world scenarios without the need for extensive manual labeling, offering scalability across varied environments and applications. Theoretically, this unified framework paves the way for further improvements in noise-tolerant learning and cross-domain adaptations, driving advancements in AI applications beyond re-ID.

Future Directions

Future work can explore the extension of this framework to other settings that suffer from noise and environmental shifts, such as video tracking or image classification tasks. Additionally, the integration of advanced clustering methods and improvements in camera shift modeling might further enhance the adaptability and robustness of unsupervised learning paradigms. As unsupervised learning continues to evolve, frameworks like these will undoubtedly contribute significantly to its expansion and application across different domains.

PDF Markdown

Related Papers

GitHub

GitHub - FlyingRoastDuck/MetaCam_DSCE: Code for our CVPR 2021 paper "MetaCam+DSCE" (58 stars)