STC Antispoofing Systems for the ASVspoof2019 Challenge (1904.05576v1)

Published 11 Apr 2019 in cs.SD, cs.CL, cs.CR, cs.LG, eess.AS, and stat.ML

Abstract: This paper describes the Speech Technology Center (STC) antispoofing systems submitted to the ASVspoof 2019 challenge. The ASVspoof2019 is the extended version of the previous challenges and includes 2 evaluation conditions: logical access use-case scenario with speech synthesis and voice conversion attack types and physical access use-case scenario with replay attacks. During the challenge we developed anti-spoofing solutions for both scenarios. The proposed systems are implemented using deep learning approach and are based on different types of acoustic features. We enhanced Light CNN architecture previously considered by the authors for replay attacks detection and which performed high spoofing detection quality during the ASVspoof2017 challenge. In particular here we investigate the efficiency of angular margin based softmax activation for training robust deep Light CNN classifier to solve the mentioned-above tasks. Submitted systems achieved EER of 1.86% in logical access scenario and 0.54% in physical access scenario on the evaluation part of the Challenge corpora. High performance obtained for the unknown types of spoofing attacks demonstrates the stability of the offered approach in both evaluation conditions.

Citations (237)

View on Semantic Scholar

Summary

The paper introduces a deep learning-based LCNN model with angular margin softmax to improve anti-spoofing detection.
It achieves an Equal Error Rate of 1.86% for logical access and 0.54% for physical access, demonstrating significant robustness against various spoofing attacks.
The study highlights the use of advanced acoustic features and normalization techniques to enhance the reliability of real-world voice biometric security.

Overview of STC Anti-spoofing Systems for the ASVspoof2019 Challenge

The paper "STC Antispoofing Systems for the ASVspoof2019 Challenge" by Lavrentyeva et al. provides a comprehensive examination of the anti-spoofing solutions developed by the Speech Technology Center for the ASVspoof2019 challenge. The focus is on improving the robustness of Automatic Speaker Verification (ASV) systems against increasingly advanced spoofing attacks, specifically addressing logical access and physical access scenarios.

Research Context and Motivation

The proliferation of voice biometric technologies, utilized in security systems, immigration, and other applications, has increased the demand for reliable ASV systems. The paper underlines that, despite advancements, these systems remain vulnerable to spoofing attacks, necessitating effective countermeasures. The ASVspoof2019 challenge, as an extension of previous iterations, sought to address this need by evaluating systems across two conditions: logical access (speech synthesis and voice conversion attacks) and physical access (replay attacks).

Methodology

The researchers propose numerous deep learning-based anti-spoofing systems, primarily enhancing the Light CNN architecture to detect attacks. A key innovation presented is the use of angular margin-based softmax, which aims to improve classification robustness by introducing angular margin constraints within the feature space. This serves as a more discriminative objective function for training classifiers. The paper details the exploration of various acoustic features, such as LFCC, CQT, and FFT, as inputs for the LCNN architecture. Integration of batch normalization and other architecture tweaks further supports system stability and convergence.

Significant Findings

The system yielded an Equal Error Rate (EER) of 1.86% for the logical access and 0.54% for the physical access scenarios on the evaluation corpus. These results represent substantial improvements in the detection of unknown attack types against the data provided by ASVspoof2019, showcasing promising robustness across diverse scenarios.

The implementation of angular margin-based softmax notably stabilized the training and optimization process, which is significant given the variability and complexity of spoofing attacks faced during the challenge. However, genuine scores followed a normal distribution, suggesting that score normalization could enhance fusion across different system evaluations.

Implications and Future Work

The STC team's findings are indicative of the effectiveness of deep-learning techniques in enhancing the spoofing detection capabilities of ASV systems. The use of an advanced architectural setup like the enhanced LCNN and specific training techniques indicate a deeper understanding and a feasible approach to dealing with the intricacies of contemporary spoofing technologies.

In terms of practical implications, the outcomes suggest a path forward for deploying more robust anti-spoofing measures in real-world voice biometric applications. However, the paper also acknowledges the limitations of simulated data, highlighting the potential gap when deploying these systems in real-world, uncontrolled environments.

Future developments could focus on expanding the system's capabilities to handle diverse and unforeseen spoofing variants, potentially integrating cross-dataset validation to enhance real-world applicability. Further exploration into the domain adaptation and regularization techniques could bolster system robustness against a broader array of attack vectors.

Overall, this paper provides an insightful contribution to anti-spoofing strategies in ASV systems, offering a solid foundation for further advancements in securing voice biometric technologies.

PDF Markdown