Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faked Speech Detection with Zero Prior Knowledge (2209.12573v6)

Published 26 Sep 2022 in cs.SD, cs.AI, cs.LG, cs.MM, cs.NE, and eess.AS

Abstract: Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone, thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers. The proposed model was trained on a set of 26 important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English) (The dataset can be provided, in raw form, by writing an email to the first author). For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy, as we were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Survey on blind image forgery detection. IET Image Processing, 7(7):660–670, 2013.
  2. Forgery detection in digital images via discrete wavelet and discrete cosine transforms. Computers & Electrical Engineering, 62:448–458, 2017.
  3. https://www.dictionary.com/browse/speech. Accessed: Sep. 26, 2022.
  4. https://www.computerhope.com/jargon/a/audio.htm. Accessed: Sep. 26, 2022.
  5. Jay Selig. What is machine learning? a definition. https://expertsystem.com/machine-learning-definition/. Accessed: Sep. 26, 2022.
  6. https://www.sciencedirect.com/topics/engineering/audio-signal. Accessed: Sep. 26, 2022.
  7. https://www2.ling.su.se/staff/hartmut/bark.htm. Accessed: May. 30,2020.
  8. https://www.teachmeaudio.com/mixing/techniques/audio-spectrum/. Accessed: Sep. 26, 2022.
  9. Nazia Hossain. What are the spectral and temporal features in speech signal? https://www.researchgate.net/post/What-are-the-Spectral-and-Temporal-Features-in-Speech-signal. Accessed: Sep. 26, 2022.
  10. Jyotika Singh. An introduction to audio processing and machine learning using Python. https://opensource.com/article/19/9/audio-processing-machine-learning-python. Accessed: Sep. 26, 2022.
  11. A Tutorial on Cepstrum and LPCCs. http://www.practicalcryptography.com/miscellaneous/machine-learning/tutorial-cepstrum-and-lpccs/. Accessed: Sep. 26, 2022.
  12. Matt Hall. The spectrum of the spectrum. https://agilescientific.com/blog/2012/3/23/the-spectrum-of-the-spectrum.html. Accessed: Sep. 26, 2022.
  13. Deep Learning For Audio. 4 2020.
  14. EEG-Based Diagnosis of Alzheimer Disease: A Review and Novel Approaches for Feature Extraction and Classification Techniques. Elsevier Science Publishing Co Inc, April 2018.
  15. Spectral centroid. https://en.wikipedia.org/wiki/Spectral_centroid. Accessed: Sep. 26, 2022.
  16. Rolloff. https://essentia.upf.edu/reference/streaming_RollOff.html. Accessed: Sep. 26, 2022.
  17. Notes on music information retrieval. https://github.com/stevetjoa/musicinformationretrieval.com/. Accessed: Sep. 26, 2022.
  18. https://www.sciencedirect.com/topics/engineering/spectral-flux. Accessed: Sep. 26, 2022.
  19. What is a Power Spectral Density (PSD)? https://community.sw.siemens.com/s/article/what-is-a-power-spectral-density-psd. Accessed: Sep. 26, 2022.
  20. S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980.
  21. Paul Mermelstein. Distance measures for speech recognition, psychological and instrumental. Pattern Recognition and Artificial Intelligence, pages 374–388, 1976.
  22. Mel Frequency Cepstral Coefficient (MFCC) tutorial. http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/. Accessed: Sep. 26, 2022.
  23. X. Zhao and D. Wang. Analyzing noise robustness of mfcc and gfcc features in speaker identification. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7204–7208, 2013.
  24. http://www.cs.tut.fi/~sgn14006/PDF2015/S04-MFCC.pdf. May. 28,2020.
  25. Robust speaker verification using gfcc based i-vectors. In Daya K. Lobiyal, Durga Prasad Mohapatra, Atulya Nagar, and Manmath N. Sahoo, editors, Proceedings of the International Conference on Signal, Networks, Computing, and Systems, pages 85–91. Springer India, 2017.
  26. Voice impersonation using generative adversarial networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  27. Signal estimation from modified short-time fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32:236–243, 1984.
  28. Hafiz Malik. Securing voice–driven interfaces against fake (cloned) audio attacks. 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019.
  29. A gated recurrent convolutional neural network for robust spoofing detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):1985–1999, 2019.
  30. Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection, 2021.
  31. A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection. In Proc. Interspeech 2019, pages 1068–1072, 2019.
  32. End-to-end anti-spoofing with rawnet2, 2021.
  33. Voice disguise vs. impersonation: Acoustic and perceptual measurements of vocal flexibility in non experts. In Proc. Interspeech 2017, pages 3777–3781, 2017.
  34. Perceptual recognition of familiar voices using falsetto as a type of voice disguise. In Proc. 14th International congress of Phonetic sciences (ICPhS 99), 1999.
  35. Voice disguise and automatic detection: Review and perspectives. Lecture Notes in Computer Science Progress in Nonlinear Speech Processing, pages 101–117, 2007.
  36. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017.
  37. Sadaoki Furui. Chapter 7 - speaker recognition in smart environments. In Hamid Aghajan, Ramón López-Cózar Delgado, and Juan Carlos Augusto, editors, Human-Centric Interfaces for Ambient Intelligence, pages 163–184. Academic Press, Oxford, 2010.
  38. I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In Interspeech, pages 930–934, 2013.
  39. Voice liveness detection for medical devices. In Design and Implementation of Healthcare Biometric Systems, pages 109–136. IGI Global, 2019.
  40. How vulnerable are prosodic features to professional imitators? In Odyssey 2008: The Speaker and Language Recognition Workshop; 2008 Jan. 21-24; Stellenbosch (South Africa).[place unknown]: ISCA; 2008. Paper 002 [4 p.]. International Speech Communication Association (ISCA), 2008.
  41. Variable length teager energy based mel cepstral features for identification of twins. In International Conference on Pattern Recognition and Machine Intelligence, pages 525–530. Springer, 2009.
  42. Aaron E Rosenberg. Automatic speaker verification: A review. Proceedings of the IEEE, 64(4):475–487, 1976.
  43. HSBC reports high trust levels in biometric tech as twins spoof its voice id system. Biometric Technology Today, 2017(6):12, 2017.
  44. Dan Simmons. BBC fools HSBC voice recognition security system. https://www.bbc.com/news/technology-39965545. Accessed: Sep. 26, 2022.
  45. Twins fool HSBC voice biometrics - BBC. https://www.finextra.com/newsarticle/30594/twins-fool-hsbc-voice-biometrics--bbc. Accessed: Sep. 26, 2022.
  46. On the similarity of identical twin fingerprints. Pattern Recognition, 35(11):2653–2663, 2002.
  47. Digital multimedia audio forensics: past, present and future. Multimedia tools and applications, 77(1):1009–1040, 2018.
  48. Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied intelligence, 53(4):3974–4026, 2023.
  49. Digital Audio Forensics: A First Practical Evaluation on Microphone and Environment Classification. In Proceedings of the 9th Workshop on Multimedia & Security, pages 63–74, New York, NY, USA, 2007. Association for Computing Machinery.
  50. Alan R. Reich. Effects of selected vocal disguises upon spectrographic speaker identification. The Journal of the Acoustical Society of America, 59(S1), 1976.
  51. Jared J. Wolf. Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(6B):2044–2056, 1972.
  52. Elisabeth Zetterholm. Impersonation–reproduction of speech. Linguistics Working Papers, 49:176–179, 01 2001.
  53. Elisabeth Zetterholm. Detection of Speaker Characteristics Using Voice Imitation, pages 192–205. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
  54. Tatsuya Kitamura. Acoustic analysis of imitated voice produced by a professional impersonator. In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 813–816, 01 2008.
  55. Glottal and vocal tract characteristics of voice impersonators. IEEE Transactions on Multimedia, 16(3):668–678, 2014.
  56. Analysis and detection of mimicked speech based on prosodic features. International Journal of Speech Technology, 15(3):407–417, 2012.
  57. Prosody based voice forgery detection using svm. In 2013 International Conference on Control Communication and Computing (ICCC), pages 527–530, 2013.
  58. Automatic speaker recognition as a measurement of voice imitation and conversion. International Journal of Speech Language and the Law, 17, 06 2010.
  59. Joseph P Campbell. Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9):1437–1462, 1997.
  60. Voxceleb: Large-scale speaker verification in the wild. Computer Speech & Language, 60:101027, 2020.
  61. Multilingual voice impersonation dataset and evaluation. In Sule Yildirim Yayilgan, Imran Sarwar Bajwa, and Filippo Sanfilippo, editors, Intelligent Technologies and Applications, pages 179–188, Cham, 2021. Springer International Publishing.
  62. Rodolfo Vasconcelos. Speaker recognition. https://github.com/ravasconcelos/spoken-digits-recognition/blob/master/src/speaker-recognition.ipynb, 2019. Accessed: Nov. 15, 2022.
  63. librosa/librosa: 0.9.2, June 2022.
  64. Aurélien Géron. Hands-on machine learning with Scikit-Learn and TensorFlow : concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Sebastopol, CA, 2 edition, 2019.
  65. A machine learning model to detect fake voice. In Hector Florez and Sanjay Misra, editors, Applied Informatics, pages 3–13, Cham, 2020. Springer International Publishing.
  66. One-class learning towards synthetic voice spoofing detection. IEEE Signal Processing Letters, 28:937–941, 2021.
  67. Improved one-class learning for voice spoofing detection. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1978–1983, 2023.
  68. Samo: Speaker attractor multi-center one-class learning for voice anti-spoofing. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
  69. One-class neural network with directed statistics pooling for spoofing speech detection. IEEE Transactions on Information Forensics and Security, 19:2581–2593, 2024.
Citations (1)

Summary

We haven't generated a summary for this paper yet.