Quality Aware Network for Set to Set Recognition (1704.03373v1)

Published 11 Apr 2017 in cs.CV and cs.AI

Abstract: This paper targets on the problem of set to set recognition, which learns the metric between two image sets. Images in each set belong to the same identity. Since images in a set can be complementary, they hopefully lead to higher accuracy in practical applications. However, the quality of each sample cannot be guaranteed, and samples with poor quality will hurt the metric. In this paper, the quality aware network (QAN) is proposed to confront this problem, where the quality of each sample can be automatically learned although such information is not explicitly provided in the training stage. The network has two branches, where the first branch extracts appearance feature embedding for each sample and the other branch predicts quality score for each sample. Features and quality scores of all samples in a set are then aggregated to generate the final feature embedding. We show that the two branches can be trained in an end-to-end manner given only the set-level identity annotation. Analysis on gradient spread of this mechanism indicates that the quality learned by the network is beneficial to set-to-set recognition and simplifies the distribution that the network needs to fit. Experiments on both face verification and person re-identification show advantages of the proposed QAN. The source code and network structure can be downloaded at https://github.com/sciencefans/Quality-Aware-Network.

Citations (313)

View on Semantic Scholar

Summary

The paper introduces a dual-branch Quality Aware Network that learns image quality and feature embeddings in a unified framework.
It employs gradient-based sample weighting to emphasize high-quality images, significantly enhancing matching performance.
Extensive experiments on face verification and person re-ID datasets demonstrate notable improvements over conventional pooling methods.

Overview of "Quality Aware Network for Set to Set Recognition"

The paper presents a novel approach to the problem of set-to-set recognition, introducing the Quality Aware Network (QAN). This model focuses on optimizing image set recognition tasks such as face verification and person re-identification by addressing a critical issue within image sets: sample quality variation. Each image set comprises multiple instances of a single identity, but variability in image quality can adversely affect recognition accuracy. The QAN, therefore, integrates a mechanism that assesses the quality of each image in a set, using this evaluation to refine the aggregation process for set representation.

Major Contributions

Quality Aware Network (QAN) Architecture: The QAN introduces a dual-branch architecture. One branch extracts feature embeddings from each image, while the other predicts a quality score. These branches are trained concurrently in an end-to-end manner, with only set-level identity annotations as supervision. This unique architecture allows for the automatic learning of image quality without explicit quality annotations.
Gradient Propagation and Sample Weighting: The paper details the gradient spread mechanism within the QAN that adjusts the influence of each image based on its quality score. This algorithmic design ensures that higher-quality samples exert more influence on the final representation, thus improving recognition performance despite noisy or low-quality images within the set.
Empirical Validation and Robustness: The QAN was empirically tested on various datasets, including PRID2011 and iLIDS-VID for person re-identification, as well as YouTube Face and IJB-A for face verification. The results demonstrate notable improvements in matching rates and a decrease in error rates compared to traditional average pooling and other baseline approaches. Significantly, the network exhibits enhanced performance even when deployed without fine-tuning across different datasets.

Results

On the PRID2011 dataset, QAN improved the top-1 matching rate by 11.1% and 29.4% over the average pooling and minimum cosine baselines, respectively.
For iLIDS-VID, which includes more noise and variability in image quality, the QAN increased top-1 matching rate by 12.21% and 37.9%.
In face verification tasks, using the YouTube Face dataset, the QAN reduced the false negative rate by 15.6% at a 0.001 false positive rate (FPR) compared to the best-performing state-of-the-art methods. Similarly, a 29.32% reduction was found for IJB-A verification at the same FPR.

Theoretical and Practical Implications

The introduction of QAN has several implications. Theoretically, it advances the field of metric learning by proposing a model that inherently adjusts for sample quality without the need for explicit supervision, demonstrating that quality can be learned and evaluated by the network itself. Practically, QAN's method of using both feature and quality predictions to enhance set representation can be applied to numerous domains where sample quality varies widely, such as surveillance footage analysis and identity verification systems.

Future Directions

The authors mention ongoing developments towards P-QAN, a fine-grained version of QAN, which aims to apply attention mechanisms to specific regions within images rather than entire images. This could potentially yield even more robust set-to-set recognition by focusing on discriminatory details within images, rather than treating the images as whole entities.

Overall, "Quality Aware Network for Set to Set Recognition" introduces a sophisticated approach to set-to-set recognition tasks, offering substantial improvements in robustness against quality variability. This paper lays foundational work that could guide further research into attention-based hierarchical recognition systems in computer vision.

PDF Markdown

Related Papers

GitHub

GitHub - liuyuisanai/Quality-Aware-Network: Code and some data for ``Quality Aware Network for Set to Set Recognition'' in CVPR 2017 (92 stars)