NormFace: L2 Hypersphere Embedding for Face Verification (1704.06369v4)

Published 21 Apr 2017 in cs.CV

Abstract: Thanks to the recent developments of Convolutional Neural Networks, the performance of face verification methods has increased rapidly. In a typical face verification method, feature normalization is a critical step for boosting performance. This motivates us to introduce and study the effect of normalization during training. But we find this is non-trivial, despite normalization being differentiable. We identify and study four issues related to normalization through mathematical analysis, which yields understanding and helps with parameter settings. Based on this analysis we propose two strategies for training using normalized features. The first is a modification of softmax loss, which optimizes cosine similarity instead of inner-product. The second is a reformulation of metric learning by introducing an agent vector for each class. We show that both strategies, and small variants, consistently improve performance by between 0.2% to 0.4% on the LFW dataset based on two models. This is significant because the performance of the two models on LFW dataset is close to saturation at over 98%. Codes and models are released on https://github.com/happynear/NormFace

Citations (714)

View on Semantic Scholar

Summary

The paper introduces an L2 normalization framework within the training phase to align feature extraction with cosine-similarity testing, addressing key inconsistencies in face verification.
It presents novel loss functions, including C-contrastive and C-triplet, that optimize training by effectively managing intra- and inter-class variations without heavy reliance on hard example mining.
Extensive experiments with ResNet and Maxout models show remarkable performance, achieving up to 99.19% accuracy on LFW and significant gains under the BLUFR protocol.

NormFace: $L_2$ Hypersphere Embedding for Face Verification

The paper "NormFace: $L_2$ Hypersphere Embedding for Face Verification" presents a detailed examination of the role of normalization in face verification models and proposes novel loss functions to enhance the performance of such models. This paper aims to address the discrepancy between the training and testing phases in face verification pipelines, particularly focusing on the feature normalization step.

Introduction and Motivation

The authors identify that the common practice in face verification involves using unnormalized inner-product similarity during training and normalized features (cosine similarity) during testing. They challenge the lack of explanation for this divergence in prior work and seek to bridge this gap by integrating normalization into the training phase. This approach is motivated by their observation that feature normalization yields performance improvements in testing; however, it poses significant challenges during training.

Technical Contributions

Necessity of Normalization:
- The authors provide a theoretical explanation for the necessity of feature normalization when using softmax loss. They demonstrate through mathematical analysis that without normalization, features exhibit a 'radial' distribution, which is suboptimal for cosine similarity used in testing.
Layer Definitions and Training Adjustments:
- They formalize the $L_2$ normalization layer and introduce a reformulated softmax loss with a scaling layer to address convergence issues resulting from normalizing features. This new form efficiently optimizes cosine similarity and ensures numerical stability.
Novel Loss Functions:
- The paper introduces C-contrastive and C-triplet loss functions, which incorporate an "agent strategy." This strategy enables model training without the necessity of hard example mining, thus simplifying the training procedure while still effectively managing intra-class and inter-class variations.

Experimental Setup and Results

Extensive experiments are conducted on the LFW and YTF datasets using two different base models: a 28-layer ResNet and a 10-layer Maxout network. Their findings highlight several key results:

Normalization and Performance:
- Integrating normalization into the training phase significantly boosts performance. For example, normalizing both features and weights yields a 99.16% accuracy on LFW using the ResNet model, outperforming multiple baseline approaches.
Combination of Loss Functions:
- Combining the softmax loss with C-contrastive or center loss functions yields further improvements. The softmax + C-contrastive combination notably achieves a 99.19% accuracy on LFW.
Practical Improvements:
- When evaluated under the BLUFR protocol, normalized models achieve notable gains, showing improved TPR@FAR=0.1\% and DIR@FAR=1\% metrics, which underscores the method's robustness under stricter evaluation criteria.
- Additionally, the proposed histogram of cosine similarities technique for video face verification offers a substantial improvement over traditional methods, reaching a 94.72% accuracy on YTF with the softmax + C-contrastive loss function.

Theoretical Implications and Future Directions

The theoretical implications of this work are profound. The paper provides essential insights into why normalization is crucial and offers a solid framework for incorporating it effectively during training. Moreover, the authors’ propositions give detailed bounds and error margins that guide hyperparameter settings, contributing to more stable and efficient training mechanisms.

In terms of future developments, this work lays the groundwork for further exploration and application of normalization techniques across other metric learning tasks such as person re-identification and image retrieval. The authors suggest that while the current methods work well for fine-tuning, developing approaches to train models from scratch remains an open area for further research.

Conclusion

This paper significantly contributes to understanding and improving face verification models by addressing the longstanding issue of feature normalization. The proposed normalizing strategies and novel loss functions facilitate more accurate and robust face recognition. These findings not only enhance the theoretical foundations of face verification but also have substantial practical implications, offering directions for future research and development in the field of computer vision and AI.

PDF Markdown