Omni-Scale Feature Learning for Person Re-Identification (1905.00953v6)

Published 2 May 2019 in cs.CV

Abstract: As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning. This is achieved by designing a residual block composed of multiple convolutional streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses pointwise and depthwise convolutions. By stacking such block layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, OSNet achieves state-of-the-art performance on six person ReID datasets, outperforming most large-sized models, often by a clear margin. Code and models are available at: \url{https://github.com/KaiyangZhou/deep-person-reid}.

Citations (776)

View on Semantic Scholar

Summary

The paper introduces OSNet, a novel network that dynamically fuses multi-scale features via an aggregation gate to enhance person re-identification.
It leverages omni-scale residual blocks and factorized convolutions to efficiently capture both local and global features in a lightweight design.
OSNet achieves state-of-the-art performance across six benchmarks, demonstrating high accuracy and efficiency suitable for resource-constrained deployments.

Omni-Scale Feature Learning for Person Re-Identification

The paper "Omni-Scale Feature Learning for Person Re-Identification" by Kaiyang Zhou et al. presents a novel contribution to the field of person re-identification (re-ID) by introducing the Omni-Scale Network (OSNet). Addressing the inherent challenge of re-ID, which requires capturing discriminative features across multiple spatial scales, OSNet excels by learning such features efficiently.

Summary of Contributions

The core innovation of OSNet lies in its ability to learn omni-scale features, a combination of homogeneous and heterogeneous spatial scales, achieved through an advanced network design. Specifically, OSNet utilizes residual blocks with multiple convolutional streams of varying scales. An aggregation gate dynamically fuses these multi-scale features using input-dependent, channel-wise weights, ensuring efficient learning of spatial-channel correlations. OSNet's lightweight design further enhances its practical applicability across several re-ID benchmarks.

Technical Details

Omni-Scale Residual Block: The design leverages multiple convolutional streams within a residual block, each operating at different scales. This approach, termed omni-scale, ensures the simultaneous capture of local and global features critical for re-ID tasks. The feature scale captured by each stream is determined by an exponent factor, and these streams are dynamically fused via a Unified Aggregation Gate (AG).
Unified Aggregation Gate (AG): The AG is a learnable network component that generates channel-wise input-dependent weights for dynamically combining multiple streams. This method adapts the feature fusion to each input image, thus enhancing the discriminative power of the learned features.
Factorized Convolutions: To maintain a lightweight architecture, OSNet employs a combination of pointwise and depthwise convolutions, drawing inspiration from efficient network designs like MobileNet. This approach significantly reduces the number of parameters and computational cost without compromising performance.

Performance and Evaluation

OSNet's performance is evaluated on six widely-used re-ID datasets: Market1501, CUHK03, DukeMTMC-reID (Duke), MSMT17, VIPeR, and GRID. The results underscore several key points:

State-of-the-Art Performance: OSNet consistently outperforms existing models on all six datasets, often achieving significant margins over more complex models. On Market1501, for instance, OSNet reaches a remarkable Rank-1 accuracy of 94.8% and a mean Average Precision (mAP) of 84.9%.
Efficiency and Scalability: Despite its superior performance, OSNet is remarkably efficient, containing only 2.2 million parameters, substantially fewer than the popular ResNet50-based models. This compactness makes OSNet suitable for deployment in resource-constrained environments.

Implications and Future Directions

The paper's results suggest that omni-scale feature learning is beneficial beyond the person re-ID domain. OSNet demonstrates competitive performance on various visual recognition tasks, such as object category recognition on CIFAR and ImageNet, and multi-label attribute recognition on PA-100K. This indicates potential applications in broader visual recognition contexts.

Speculation on Future Developments

In light of the promising results, future research could explore several directions:

Dynamic Model Adjustments: Further refining the aggregation gate mechanisms to adapt even more dynamically to varying conditions could enhance performance.
Architectural Innovations: Investigating hybrid architectures that combine omni-scale feature learning with other advanced techniques like capsule networks or graph neural networks could yield further improvements.
Transfer Learning and Cross-Domain Applications: Extending OSNet's capability to handle domain adaptation problems, as hinted in their supplementary material, could open up new applications, especially in surveillance and security domains.

In conclusion, the introduction of OSNet marks a significant technical advancement in the person re-ID field, combining lightweight architecture with powerful omni-scale feature learning. The network's design principles and demonstrated efficiency establish a solid foundation for future explorations in visual recognition tasks.

PDF Markdown

Related Papers

GitHub

GitHub - KaiyangZhou/deep-person-reid: Torchreid: Deep learning person re-identification in PyTorch. (4,353 stars)

Tweets

https://twitter.com/_colemurray/status/1810387728196042783