Learning Generalisable Omni-Scale Representations for Person Re-Identification (1910.06827v5)

Published 15 Oct 2019 in cs.CV

Abstract: An effective person re-identification (re-ID) model should learn feature representations that are both discriminative, for distinguishing similar-looking people, and generalisable, for deployment across datasets without any adaptation. In this paper, we develop novel CNN architectures to address both challenges. First, we present a re-ID CNN termed omni-scale network (OSNet) to learn features that not only capture different spatial scales but also encapsulate a synergistic combination of multiple scales, namely omni-scale features. The basic building block consists of multiple convolutional streams, each detecting features at a certain scale. For omni-scale feature learning, a unified aggregation gate is introduced to dynamically fuse multi-scale features with channel-wise weights. OSNet is lightweight as its building blocks comprise factorised convolutions. Second, to improve generalisable feature learning, we introduce instance normalisation (IN) layers into OSNet to cope with cross-dataset discrepancies. Further, to determine the optimal placements of these IN layers in the architecture, we formulate an efficient differentiable architecture search algorithm. Extensive experiments show that, in the conventional same-dataset setting, OSNet achieves state-of-the-art performance, despite being much smaller than existing re-ID models. In the more challenging yet practical cross-dataset setting, OSNet beats most recent unsupervised domain adaptation methods without using any target data. Our code and models are released at \texttt{https://github.com/KaiyangZhou/deep-person-reid}.

View on arXiv

Authors (4)

Kaiyang Zhou (40 papers)
Yongxin Yang (73 papers)
Andrea Cavallaro (59 papers)
Tao Xiang (324 papers)

Citations (189)

View on Semantic Scholar

Summary

Overview of "Learning Generalisable Omni-Scale Representations for Person Re-Identification"

In the paper titled "Learning Generalisable Omni-Scale Representations for Person Re-Identification," the authors present a novel convolutional neural network (CNN) architecture called OSNet. This work addresses two core challenges in person re-identification (re-ID): discriminative feature learning and generalisable feature learning. OSNet is specifically designed to capture omni-scale feature representations that are adaptable across varying spatial scales, enhancing both discriminative capabilities and generalizability across datasets without requiring adaptation.

Key Contributions

Omni-Scale Feature Learning with OSNet:
- OSNet introduces an architecture that encompasses multiple convolutional streams, each detecting features at distinct spatial scales. The architecture is lightweight due in part to its use of factorised convolutions, leading to efficient, yet potent, feature extraction.
- The omniscale features are enabled by a unified aggregation gate, a mechanism that dynamically fuses multi-scale features using channel-wise weights, which are input-dependent.
Instance Normalisation for Cross-Domain Generalisation:
- Incorporating instance normalization (IN) within OSNet helps manage cross-dataset discrepancies in image style, adapting to varied lighting conditions, backgrounds, and viewpoints without requiring target domain data.
- The positions for IN layers within OSNet are optimized via a differentiable architecture search algorithm, enhancing generalization to unseen domains efficiently.
Dense Experiments and Analysis:
- Extensive empirical evaluations demonstrate OSNet's state-of-the-art performance in traditional same-dataset re-ID scenarios.
- In cross-dataset scenarios, OSNet significantly outperforms many recent unsupervised domain adaptation methods, achieving competitive generalisation capabilities without using target domain data for adaptation.

Results and Significance

The paper provides numerical evidence of OSNet's efficacy. When evaluated against existing models, OSNet consistently achieves superior results with a drastically reduced number of parameters (2.2 million), showcasing a profound balance of efficiency and effectiveness. Additionally, OSNet's application to small and large re-ID datasets reveals its robustness in varying data conditions, further solidifying its utility for real-world implementations where large-scale data collection is infeasible.

Implications and Future Directions

The theoretical notion of omni-scale learning and the practical implementation through OSNet herald a substantial shift in re-ID model design, emphasizing efficiency without compromising performance. The success of OSNet in dynamically handling varying scales and its demonstrated flexibility could serve as a foundation for extending similar architectural principles to other computer vision tasks beyond person re-ID.

Looking ahead, the concepts introduced in OSNet may lead to new research directions in AI, particularly in contexts that demand high adaptability and discriminative power across unseen environments or scenarios. Further explorations could involve enhancing the dynamic adaptation mechanisms or extending architecture search techniques to cater to specific domain generalisation challenges in diverse applications.