MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices (1804.07573v4)

Published 20 Apr 2018 in cs.CV and cs.LG

Abstract: We present a class of extremely efficient CNN models, MobileFaceNets, which use less than 1 million parameters and are specifically tailored for high-accuracy real-time face verification on mobile and embedded devices. We first make a simple analysis on the weakness of common mobile networks for face verification. The weakness has been well overcome by our specifically designed MobileFaceNets. Under the same experimental conditions, our MobileFaceNets achieve significantly superior accuracy as well as more than 2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the refined MS-Celeb-1M, our single MobileFaceNet of 4.0MB size achieves 99.55% accuracy on LFW and 92.59% TAR@FAR1e-6 on MegaFace, which is even comparable to state-of-the-art big CNN models of hundreds MB size. The fastest one of MobileFaceNets has an actual inference time of 18 milliseconds on a mobile phone. For face verification, MobileFaceNets achieve significantly improved efficiency over previous state-of-the-art mobile CNNs.

Citations (528)

View on Semantic Scholar

Summary

The paper presents MobileFaceNets, compact CNNs designed for fast and accurate face verification on mobile devices with under one million parameters.
It introduces a Global Depthwise Convolution layer that enhances feature discriminability by weighting spatial positions differently.
MobileFaceNets achieve over 99% accuracy on LFW and deliver more than twofold speed improvements compared to MobileNetV2.

MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices

In the paper "MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices," Chen et al. present a series of compact CNN architectures—known as MobileFaceNets—designed specifically for achieving high-accuracy face verification on resource-constrained mobile and embedded platforms. These models utilize less than one million parameters to achieve substantial improvements in speed and accuracy compared to existing mobile architectures, such as MobileNetV2.

Key Contributions

The authors spotlight the following primary contributions:

Tailored Architecture for Face Verification: MobileFaceNets employ a custom-designed CNN architecture that surpasses common mobile networks in both speed and accuracy for face verification tasks.
Global Depthwise Convolution: Unlike conventional global average pooling layers, MobileFaceNets integrate a Global Depthwise Convolution (GDConv) layer that assigns varying importance to spatial positions, which enhances the discriminative ability of the feature vector.
Extensive Efficiency: With less than 1 million parameters, MobileFaceNets exhibit a more than two-fold speed increase over MobileNetV2 while maintaining or exceeding accuracy levels typically associated with larger models.
Strong Performance Metrics: A MobileFaceNet model of just 4.0MB achieves 99.55% accuracy on the LFW dataset and 92.59% TAR@FAR1e-6 on the MegaFace dataset, rivaling models that are orders of magnitude larger.

Experimental Evaluation

Experimental assessments were conducted using standard face verification benchmarks such as LFW and MegaFace. Results demonstrate the suitability of MobileFaceNets for real-time applications on mobile devices. The models underwent training with the ArcFace loss on the refined MS-Celeb-1M dataset, further validating their performance against state-of-the-art CNN models in face recognition.

Tables in the original paper illustrate compelling numerical results, where MobileFaceNet significantly outperforms baseline models such as MobileNetV1 and ShuffleNet in both verification accuracy and inference speed. Notably, the primary MobileFaceNet achieved a 99.28% accuracy on LFW with a faster inference time compared to other configurations, such as MobileNetV2-GDConv.

Implications and Future Directions

Practically, MobileFaceNets can be instrumental in implementing reliable face verification systems on mobile devices, enabling functionalities like device unlocking, secure authentication, and mobile payments without the dependency on high computational resources. Theoretically, the integration of GDConv layers proposes a novel approach to assigning spatial importance, which could be extended to other machine learning tasks that demand balanced accuracy and network efficiency.

For future work, there is potential in coupling MobileFaceNets with techniques like pruning, quantization, and knowledge distillation to further reduce resource consumption while enhancing performance. Moreover, expanding research into tailoring lightweight architectures for other specialized tasks could significantly advance the deployment of high-performance machine learning models on constrained devices.

This paper solidly positions MobileFaceNets as a pioneering effort in optimizing neural network architecture for real-world applications demanding both precision and efficiency. The methodological innovations and empirical findings provide a robust foundation for advancements in mobile computer vision applications.

PDF Markdown