Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Half-Truth: A Partially Fake Audio Detection Dataset (2104.03617v2)

Published 8 Apr 2021 in cs.SD, cs.AI, cs.CL, and eess.AS

Abstract: Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance.The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than fully fake audio for fake audio detection. The HAD dataset is publicly available: https://zenodo.org/records/10377492.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, and R. A. Saurous, “Tacotron: Towards end-to-end speech synthesis,” in Proc. of INTERSPEECH, 2017.
  2. J. Shen, R. Pang, R. J. Weiss, and et al., “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions,” in Proc. of ICASSP, 2018.
  3. Y. Wang, D. Stanton, Y. Zhang, and et al., “Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis,” in Proc. of ICML, 2018.
  4. Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilc¸i, and et al., “Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in Proc. of INTERSPEECH, 2015.
  5. T. Kinnunen, M. Sahidullah, H. Delgado, N. E. M. Todisco, and et al., “The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection,” in Proc. of INTERSPEECH, 2017.
  6. M. Todisco, X. Wang, V. Vestman, M. Sahidullah, and K. A. Lee, “Asvspoof 2019: Future horizons in spoofed and fake audio detection,” in Proc. of INTERSPEECH, 2019.
  7. T. Chen, A. Kumar, P. Nagarsheth, G. Sivaraman, and E. Khoury, “Generalization of audio deepfake detection,” in Proc. of Odyssey: The Speaker and Language Recognition Workshop, 2020.
  8. R. Wang, F. Juefei-Xu, Y. Huang, Q. Guo, and et al., “Deepsonar: Towards effective and robust detection of ai-synthesized fake voices,” in Proc. of ACM MM, 2020.
  9. Z. Wu, R. K. Das1, J. Yang, and H. Li, “Light convolutional neural network with feature genuinization for detection of synthetic speech attacks,” in Proc. of INTERSPEECH, 2020.
  10. Y. W. Lau, M. Wagner, and D. Tran, “Vulnerability of speaker verification to voice mimicking,” in Proc. of Int. Symposium on Intelligent Multimedia,Video and Speech Processing, 2004.
  11. R. G. Hautamaki, T. Kinnunen, V. Hautamaki, T. Leino, and A. M. Laukkanen, “I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry,” in Proc. of INTERSPEECH, 2013.
  12. P. L. D. Leon, M. Pucher, and J. Yamagishi, “Evaluation of the vulnerability of speaker verification to synthetic speech,” in Proc. of Odyssey: The Speaker and Language Recognition Workshop, 2010.
  13. X. Wang, J. Yamagishi1, M. Todisco1c, and et al., “Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Journal of Computer Speech and Language, vol. 64, 2020.
  14. J.-F. Bonastre, D. Matrouf, and C. Fredouille, “Artificial impostor voice transformation effects on false acceptance rates,” in Proc. of INTERSPEECH, 2007.
  15. T. Kinnune, Z.-Z. Wu, K. A. Lee, and et al., “Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech,” in Proc. of ICASSP, 2012.
  16. F. Alegre, A. Amehraye, and N. Evans, “A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns,” in Proc. of Int. Conf. on Biometrics: Theory, Applications and Systems (BTAS), 2013.
  17. Z. Kons and H. Aronowitz, “Voice transformation-based spoofing of text dependent speaker verification systems,” in Proc. of INTERSPEECH, 2013.
  18. Z. Wu, A. Larcher, K. A. Lee, and et al., “Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints,” in Proc. of INTERSPEECH, 2013.
  19. Z. Wu, A. Khodabakhsh, C. Demiroglu, and et al., “Sas : A speaker verification spoofing database containing diverse attacks,” in Proc. of ICASSP, 2015.
  20. G. Lavrentyeva1, S. Novoselov1, M. Volkova, and et al., “Phonespoof: A new dataset for spoofing attack detection in telephone channel,” in Proc. of ICASSP, 2019.
  21. R. Reimao and V. Tzerpos, “For: A dataset for synthetic speech detection,” 2020.
  22. R. J. Skerry-Ryan, E. Battenberg, X. Ying, Y. Wang, and R. A. Saurous, “Towards end-to-end prosody transfer for expressive speech synthesis with tacotron,” in Proc. of ICML, 2018.
  23. Y. Shi, H. Bu, X. Xu, S. Zhang, and M. Li, “Aishell-3: A multi-speaker mandarin tts corpus and the baselines,” in arXiv preprint arXiv:2010.11567, 2020.
  24. J. Valin and J. Skoglund, “LPCNET: improving neural speech synthesis through linear prediction,” in Proc. of ICASSP, 2019.
  25. T. Gulzar, A. Singh, and S. Sharma, “Comparative analysis of lpcc, mfcc and bfcc for the recognition of hindi words using artificial neural networks,” International Journal of Computer Applications, vol. 101, no. 12, pp. 22–27, 2014.
  26. X. Wu, R. He, Z. Sun, and T. Tan, “A light cnn for deep face representation with noisy labels,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2884–2896, 2018.
  27. M. Todisco, H. Delgado, and N. Evans, “Constant q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Computer Speech and Language, vol. 45, pp. 516–535, 2017.
  28. G. Lavrentyeva, S. Novoselov, E. Malykh, A. Kozlov, and V. Shchemelinin, “Audio replay attack detection with deep learning frameworks,” in Interspeech, 2017.
Citations (75)

Summary

We haven't generated a summary for this paper yet.