Transferring a Semantic Representation for Person Re-Identification and Search (1706.03725v1)

Published 12 Jun 2017 in cs.CV

Abstract: Learning semantic attributes for person re-identification and description-based person search has gained increasing interest due to attributes' great potential as a pose and view-invariant representation. However, existing attribute-centric approaches have thus far underperformed state-of-the-art conventional approaches. This is due to their non-scalable need for extensive domain (camera) specific annotation. In this paper we present a new semantic attribute learning approach for person re-identification and search. Our model is trained on existing fashion photography datasets -- either weakly or strongly labelled. It can then be transferred and adapted to provide a powerful semantic description of surveillance person detections, without requiring any surveillance domain supervision. The resulting representation is useful for both unsupervised and supervised person re-identification, achieving state-of-the-art and near state-of-the-art performance respectively. Furthermore, as a semantic representation it allows description-based person search to be integrated within the same framework.

PDF Abstract

Semantic Representation for Person Re-Identification and Search

The paper addresses the challenge of developing effective semantic representations for person re-identification (re-id) and search, tasks integral to visual surveillance. It focuses on harnessing semantic attributes, which potentially offer invariant characteristics across poses and views, for re-id and description-based search. Despite previous attempts, attribute-centric methods have struggled against conventional solutions due to non-scalable, domain-specific annotation requirements. The paper proposes a novel approach that leverages existing fashion photography datasets to train semantic attributes, which can then be seamlessly transferred to the surveillance domain with minimal supervision, thereby overcoming these limitations.

Technical Approach

The proposed solution utilizes a generative modeling approach based on the Indian Buffet Process (IBP) to learn semantic attributes from fashion datasets. This method contrasts with traditional discriminative models, offering notable advantages such as joint learning of attributes and the ability to leverage weakly annotated data. Importantly, the model facilitates unsupervised domain adaptation through Bayesian priors, enabling the transfer of learned semantic representations to the surveillance domain without requiring surveillance-specific supervision.

The model is trained on two different types of fashion datasets: those with strong (pixel-level) and weak (image-level) annotations. This flexibility allows the model to effectively generalize attributes across different domains and scenarios. The core contribution is a refined framework capable of learning and adapting attribute models, thus providing a robust semantic description of individuals in surveillance footage for both supervised and unsupervised re-id tasks.

Key Results and Evaluation

The paper reports compelling numerical results, showcasing state-of-the-art performance in unsupervised person re-id tasks across multiple datasets, including VIPeR, CUHK01, and PRID450S. The semantic representation achieved using the proposed method surpasses other unsupervised methods significantly and competes closely with supervised methods. The framework's capacity for achieving semantic richness without heavy reliance on surveillance-specific data annotation underscores its effectiveness and potential for practical deployment.

Furthermore, the representation facilitates description-based person search, integrating seamlessly with the re-id framework. This dual capability highlights the model's versatility and the practical importance of its semantic foundations. By combining generative learning with Bayesian adaptation, the model can effectively tackle complex querying, including conjunctive attribute conditions, which is a critical advancement over existing attribute-based search methodologies.

Implications and Future Directions

The implications of this research are substantial for both theoretical exploration and practical applications in AI and computer vision. The approach demonstrates a meaningful step towards reducing reliance on domain-specific annotations, enhancing the scalability and applicability of person re-id systems. It bridges the gap between distinct domains—fashion and surveillance—proving that transferable semantic representations can be harnessed from domains with richer annotations to those with more challenging and different visual characteristics.

The methodology highlights promising directions for future research, such as exploring additional sources of domain adaptation and refining the generative modeling components to handle even more diverse and complex attribute combinations. As the domain of computer vision continues to expand, the concepts and frameworks introduced in this paper are poised to inform broader applications beyond surveillance, contributing to advancements in understanding and implementing human-centric machine vision systems more broadly.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Zhiyuan Shi (16 papers)
Timothy M. Hospedales (69 papers)
Tao Xiang (324 papers)

Citations (213)

View on Semantic Scholar

Transferring a Semantic Representation for Person Re-Identification and Search (1706.03725v1)

Semantic Representation for Person Re-Identification and Search

Technical Approach

Key Results and Evaluation

Implications and Future Directions

Related Papers