ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

Published 9 Apr 2019 in eess.AS, cs.CR, and cs.SD | (1904.05441v2)

Abstract: ASVspoof, now in its third edition, is a series of community-led challenges which promote the development of countermeasures to protect automatic speaker verification (ASV) from the threat of spoofing. Advances in the 2019 edition include: (i) a consideration of both logical access (LA) and physical access (PA) scenarios and the three major forms of spoofing attack, namely synthetic, converted and replayed speech; (ii) spoofing attacks generated with state-of-the-art neural acoustic and waveform models; (iii) an improved, controlled simulation of replay attacks; (iv) use of the tandem detection cost function (t-DCF) that reflects the impact of both spoofing and countermeasures upon ASV reliability. Even if ASV remains the core focus, in retaining the equal error rate (EER) as a secondary metric, ASYspoof also embraces the growing importance of fake audio detection. ASVspoof 2019 attracted the participation of 63 research teams, with more than half of these reporting systems that improve upon the performance of two baseline spoofing countermeasures. This paper describes the 2019 database, protocols and challenge results. It also outlines major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio.

Abstract PDF Upgrade to Chat

Citations (566)

View on Semantic Scholar

Summary

The paper introduces diverse attack scenarios that rigorously test ASV systems using advanced TTS, voice conversion, and realistic replay simulations.
The paper adapts the tandem detection cost function (t-DCF) to offer a nuanced evaluation of spoofing countermeasures beyond traditional metrics.
The paper demonstrates robust detection improvements, evidenced by 63 participating research teams and strong performance of ensemble and neural network approaches.

ASVspoof 2019: Advancements in Spoofed and Fake Audio Detection

The ASVspoof 2019 paper presents an in-depth exploration of the advancements in spoofed and fake audio detection, building on previous iterations with a focus on enhancing the security of automatic speaker verification (ASV) systems. The study addresses both logical access (LA) and physical access (PA) scenarios, extending the investigation to synthetic, converted, and replayed speech attacks. This research reflects not only improvements in attack simulation using state-of-the-art neural acoustic models but also an emphasis on robust countermeasure development.

Key Contributions

The 2019 edition of ASVspoof introduces several notable advancements:

Diverse Attack Scenarios: The paper examines logical access using top-tier text-to-speech (TTS) and voice conversion (VC) technologies, detailed enough to challenge ASV system reliability.
Enhanced Replay Simulations: Physical access attacks are refined with a more controlled replay simulation setup, relevant for real-world scenarios like those in smart home devices.
Evaluation Metrics: The ASVspoof 2019 adapts the tandem detection cost function (t-DCF) as a primary metric over the equal error rate (EER), to more accurately measure the impact of spoofing and detection countermeasures on ASV reliability.

Database and Methodology

The paper describes a comprehensive database built using the VCTK corpus, partitioned into logical and physical access scenarios with distinct training, development, and evaluation datasets. These datasets include a variety of known and unknown attacks, encouraging systems that generalize well to new spoofing methods.

For logical access, speech data is generated from multiple TTS and VC systems using novel neural waveform models. Physical access data simulates realistic replay scenarios with varying acoustic properties, recording conditions, and speaker-to-microphone distances.

Performance Metrics

The introduction of the t-DCF metric is pivotal as it offers a nuanced evaluation of system performance, taking into account both spoofing attempts and the countermeasures in the context of ASV systems. Additionally, baseline systems employing GMM classifiers with various cepstral coefficients serve as benchmarks for evaluating participant systems.

Results and Analysis

The challenge attracted participation from 63 research teams, with many surpassing baseline performance metrics. The study reports that top systems, especially those utilizing ensemble and neural network approaches, significantly improved detection capabilities. The t-DCF and EER results highlight that logical access scenarios benefit from ensemble classifier approaches due to the diverse nature of attacks, while physical access attacks show consistent detection across different configurations.

Implications and Future Work

ASVspoof 2019 underscores the importance of continuous adaptation in the face of advancing spoofing technologies. The implications of this research are significant for applications requiring secure voice authentication, particularly as TTS and VC technologies evolve. Future research may focus on further improving generalization to new attack methods and enhancing the robustness of countermeasures in diverse acoustic environments.

This study provides a comprehensive foundation for both theoretical and practical advancements in spoofing detection, setting a benchmark for future iterations and research in the field. The ASV-centric evaluation approach marks a significant shift towards more holistic assessments, balancing security with user convenience.