Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection (1907.04047v1)

Published 9 Jul 2019 in cs.CV and cs.CR

Abstract: Face recognition has evolved as a prominent biometric authentication modality. However, vulnerability to presentation attacks curtails its reliable deployment. Automatic detection of presentation attacks is essential for secure use of face recognition technology in unattended scenarios. In this work, we introduce a Convolutional Neural Network (CNN) based framework for presentation attack detection, with deep pixel-wise supervision. The framework uses only frame level information making it suitable for deployment in smart devices with minimal computational and time overhead. We demonstrate the effectiveness of the proposed approach in public datasets for both intra as well as cross-dataset experiments. The proposed approach achieves an HTER of 0% in Replay Mobile dataset and an ACER of 0.42% in Protocol-1 of OULU dataset outperforming state of the art methods.

Citations (182)

View on Semantic Scholar

Summary

The paper proposes DeepPixBiS, a convolutional neural network framework using deep pixel-wise binary supervision to efficiently detect face presentation attacks.
DeepPixBiS achieved state-of-the-art performance, scoring 0% HTER on Replay Mobile and 0.42% ACER on OULU-NPU datasets.
This framework offers an efficient, deployable solution for real-world applications and suggests methods for reducing data requirements in future research.

Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection

The paper "Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection" by Anjith George and Sebastien Marcel presents a convolutional neural network (CNN) framework aimed at enhancing the reliability and security of face recognition systems by detecting presentation attacks (PA). The authors focus on developing a solution that minimizes computational overhead, making it suitable for deployment on smart devices in scenarios where quick decision-making is crucial.

Framework Overview

The proposed framework, termed DeepPixBiS, leverages a densely connected neural network architecture with both binary and pixel-wise binary labels. This method avoids the need to synthesize depth maps and incorporates pixel-wise supervision directly within the model's architecture. Pixel-wise supervision is realized by assigning binary labels to patches within the facial image — a methodological simplification that remains effective, demonstrating superior performance within benchmark datasets.

Performance Metrics

DeepPixBiS achieves notable results in public datasets, outperforming current methods significantly. In the Replay Mobile dataset, the framework scores an impressive HTER (Half Total Error Rate) of 0%, demonstrating flawless performance in distinguishing legitimate facial presentations from attacks. In the OULU-NPU dataset, it achieves an ACER (Average Classification Error Rate) of 0.42% for Protocol-1, surpassing existing state-of-the-art techniques.

Significance and Implications

The success of DeepPixBiS in intra-dataset testing reflects its robustness and accuracy in scenarios involving known attack types. The paper also explores cross-dataset generalization, yielding an HTER of 12.4% when trained on one dataset and tested on another. This addresses the critical challenge of maintaining detection efficacy across varied conditions and datasets, highlighting the need for more training data to further enhance generalization capabilities.

Practically, DeepPixBiS offers an efficient, deployable solution for real-world applications where quick verification is needed, such as mobile authentication. From a theoretical standpoint, it emphasizes the importance of simplifying the training process while still capturing essential discriminative features, providing a potential pathway for future research into reducing data requirement constraints in CNN training for PA detection.

Future Directions

While DeepPixBiS offers significant advancements, the authors suggest that the fusion of temporal features could enhance accuracy, especially for detecting more subtly executed attacks. Given the limitations in dataset size, ongoing research into creating and sharing large-scale datasets would be beneficial for developing better-generalizing models. Furthermore, integrating this framework with additional biometric systems could improve robustness against varying attack types and conditions.

In conclusion, "Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection" contributes valuable insights and methodologies to the field of biometrics, proposing a reliable, efficient framework that balances complexity and performance. The provision of detailed reproducibility protocols and source code encourages further exploration and enhancement in PA detection technologies.