Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing (2203.05340v4)

Published 10 Mar 2022 in cs.CV

Abstract: With diverse presentation attacks emerging continually, generalizable face anti-spoofing (FAS) has drawn growing attention. Most existing methods implement domain generalization (DG) on the complete representations. However, different image statistics may have unique properties for the FAS tasks. In this work, we separate the complete representation into content and style ones. A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space. Then, to obtain a generalized representation, a contrastive learning strategy is developed to emphasize liveness-related style information while suppress the domain-specific one. Finally, the representations of the correct assemblies are used to distinguish between living and spoofing during the inferring. On the other hand, despite the decent performance, there still exists a gap between academia and industry, due to the difference in data quantity and distribution. Thus, a new large-scale benchmark for FAS is built up to further evaluate the performance of algorithms in reality. Both qualitative and quantitative results on existing and proposed benchmarks demonstrate the effectiveness of our methods. The codes will be available at https://github.com/wangzhuo2019/SSAN.

Authors (7)

Zhuo Wang (54 papers)
Zezheng Wang (14 papers)
Zitong Yu (119 papers)
Weihong Deng (71 papers)
Jiahong Li (17 papers)
Tingting Gao (25 papers)
Zhongyuan Wang (105 papers)

Citations (105)

View on Semantic Scholar

Summary

Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

The paper "Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing" presents a novel approach to enhance the generalizability of face anti-spoofing (FAS) models across various unseen domains. The authors propose a Shuffled Style Assembly Network (SSAN), which separates feature representations into content and style components. This approach intends to leverage the unique properties of each component to better tackle the continuously evolving presentation attacks in face recognition systems.

Methodology

The central premise of SSAN lies in its ability to differentiate and separately process content and style features. Content features encapsulate global semantics and physical attributes common across domains, while style features capture liveness-related information, which may also encompass domain-specific nuances.

Content and Style Feature Extraction: The network employs a two-stream architecture where the content stream utilizes batch normalization to focus on domain-agnostic global features, and the style stream uses instance normalization to isolate sample-specific characteristics indicative of liveness and spoofing.
Shuffled Style Assembly: The SSAN framework introduces a mechanism to reassemble these content and style features through a shuffle-then-assemble strategy. By randomly pairing content features with different style features, the network generates a diverse set of stylized features, emphasizing liveness cues and diminishing domain-specific biases.
Contrastive Learning: To refine the stylized feature space, SSAN incorporates a contrastive learning strategy. This technique employs stop-gradient based contrastive loss to control the proximity of shuffle-assembled features to self-assembled anchor features, based on their liveness classification. This process strengthens the liveness-related distinctions while suppressing irrelevant domain-specific variations.

Experimental Setup

The authors conduct a series of experiments across both traditional datasets and a newly proposed large-scale benchmark to demonstrate the advantages of SSAN over existing methods. These datasets include OULU-NPU, CASIA-MFSD, Replay-Attack, and MSU-MFSD, alongside a compiled benchmark of twelve diverse face datasets reflecting real-world data distribution.

Key findings from these experiments include:

Performance on Limited Source Domains: The SSAN model outperforms state-of-the-art models, achieving lower half-total error rates (HTER) and higher area under the curve (AUC) across multiple cross-domain scenarios.
Results on Large-Scale Benchmarks: The approach exhibits robust performance under proposed intra- and inter-domain testing protocols, achieving substantial improvements in True Positive Rate (TPR) at various False Positive Rates (FPR).

Implications

The SSAN model's ability to generalize across unseen domains without relying on domain-specific adaptations marks a significant advancement in FAS tasks. Its independence from domain data reduces the need for costly and impractical domain-specific data collection in real-world applications. The shuffled style assembly method and contrastive learning strategies together curtail the impact of domain bias, thus promising more secure facial recognition systems.

Future Directions

The framework opens several paths for future exploration:

Adoption of Additional Normalization Techniques: Exploring the application of other normalization techniques could further enhance the decomposition of content and style features.
Integration with Real-Time Systems: Optimizing SSAN for real-time applications could address the performance constraints noticed in industrial settings.
Expansion of Style Component Utilization: Broader research into leveraging other auxiliary signals or unsupervised techniques to inform the style component could offer richer feature spaces.

In conclusion, the SSAN framework represents a promising direction for improving the robustness and reliability of face anti-spoofing systems by prioritizing generalization capabilities over domain-specific learning.

Related Papers

Find Related Papers

GitHub

GitHub - wangzhuo2019/SSAN: Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing, CVPR2022. (102 stars)