Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials (2404.01981v2)

Published 2 Apr 2024 in cs.LG, cs.SD, and eess.AS

Abstract: Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in developing more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Diagnostic and statistical manual of mental disorders: DSM-5. Vol. 5. American psychiatric association Washington, DC.
  2. Is Speech Pathology a Biomarker in Automatic Speaker Verification? arXiv preprint arXiv:2204.06450 (2022).
  3. Vincent Beaufils and Johannes Tomin. 2023. Stochastic approach to worldwide language classification: the signals and the noise towards long-range exploration. http://www.eLinguistics.net Accessed: September 1, 2023.
  4. The natural history of Alzheimer’s disease: description of study cohort and accuracy of diagnosis. Archives of neurology 51, 6 (1994), 585–594.
  5. Word fluency and brain damage. Neuropsychologia 5, 2 (1967), 135–140.
  6. John D Bransford and Marcia K Johnson. 1972. Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of verbal learning and verbal behavior 11, 6 (1972), 717–726.
  7. Speaker Verification Using Line Spectrum Frequency, Formant, and Support Vector Machine. In 2009 11th IEEE International Symposium on Multimedia. IEEE, 562–566.
  8. Jyh-Min Cheng and Hsiao-Chuan Wang. 2004. A method of estimating the equal error rate for automatic speaker verification. In 2004 International Symposium on Chinese Spoken Language Processing. IEEE, 285–288.
  9. Barry R. Chiswick and Paul W. Miller. 2008. Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages. Journal of Multilingual and Multicultural Development (2008), 1–11.
  10. Speakerstew: Scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system. arXiv preprint arXiv:2104.02125 (2021).
  11. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018).
  12. The Fisher corpus: A resource for the next generations of speech-to-text.. In LREC, Vol. 4. 69–71.
  13. Letter and category fluency in community-dwelling Canadian seniors: A comparison of normal participants to those with dementia of the Alzheimer or vascular type. Journal of clinical and experimental neuropsychology 19, 1 (1997), 52–62.
  14. Ecapa-tdnn embeddings for speaker diarization. arXiv preprint arXiv:2104.01466 (2021).
  15. Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). Association for Computational Linguistics, Online, 193–199. https://doi.org/10.18653/v1/2020.wnut-1.25
  16. Revised NIA-AA criteria for the diagnosis of Alzheimer’s disease: a step forward but not yet ready for widespread clinical use. International psychogeriatrics 23, 8 (2011), 1191–1196.
  17. Performance on the Boston Cookie Theft picture description task in patients with early dementia of the Alzheimer’s type: missing information. Aphasiology 10, 4 (1996), 395–408.
  18. John Godfrey and Edward Holliman. 1993. Switchboard-1 Release 2 LDC97S62. Linguistic Data Consortium (1993), 34.
  19. Recall memory deficit in schizophrenia: A possible manifestation of prefrontal dysfunction. Schizophr Res 2 (1989), 251–257. Issue 3. https://doi.org/10.1016/0920-9964(89)90001-7
  20. BDAE: The Boston diagnostic aphasia examination. Lippincott Williams & Wilkins Philadelphia, PA.
  21. Amna Irum and Ahmad Salman. 2019. Speaker verification using deep neural networks: A. International Journal of Machine Learning and Computing 9, 1 (2019).
  22. Neil T Kleynhans and Etienne Barnard. 2005. Language dependence in multilingual speaker verification. (2005).
  23. SpeakerNet: 1D depth-wise separable convolutional network for text-independent speaker recognition and verification. arXiv preprint arXiv:2010.12653 (2020).
  24. TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8102–8106.
  25. Cross-lingual speaker verification with deep feature learning. In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1040–1044.
  26. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowledge and Information Systems 64, 12 (2022), 3197–3234.
  27. Factors Affecting the Performance of Automated Speaker Verification in Alzheimer’s Disease Clinical Trials. (2023).
  28. Ten Years of Research on Automatic Voice and Speech Analysis of People With Alzheimer’s Disease and Mild Cognitive Impairment: A Systematic Review Article. Frontiers in Psychology 12 (2021), 620251. https://doi.org/10.3389/fpsyg.2021.620251
  29. Håkan Melin. 1996. Speaker verification in telecommunication. Department of Speech, Music and Hearing, KTH, Available from: http://www. speech. kth. se/~ melin/publications. html (1996).
  30. Connected speech and language in mild cognitive impairment and Alzheimer’s disease: A review of picture description tasks. Journal of clinical and experimental neuropsychology 40, 9 (2018), 917–939.
  31. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).
  32. Glucose-induced increase in memory performance in patients with schizophrenia. Schizophrenia Bulletin 25, 2 (1999), 321–335.
  33. Jekaterina Novikova. 2021. Robustness and Sensitivity of BERT Models Predicting Alzheimer’s Disease from Text. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), Wei Xu, Alan Ritter, Tim Baldwin, and Afshin Rahimi (Eds.). Association for Computational Linguistics, Online, 334–339. https://doi.org/10.18653/v1/2021.wnut-1.37
  34. Recognition of biometric unlock pattern by GMM-UBM. In 2018 26th Signal Processing and Communications Applications Conference (SIU). IEEE, 1–4.
  35. Rupal Patel and Kathryn Connaghan. 2014. Park Play: A picture description task for assessing childhood motor speech disorders. International Journal of Speech-Language Pathology 16, 4 (2014), 337–343.
  36. Enhanced Speaker Verification Incorporated with Face Recognition. In 2022 3rd International Conference on Smart Electronics and Communication (ICOSEC). IEEE, 1468–1471.
  37. CNS sites cooperate to detect duplicate subjects with a clinical trial subject registry. Innovations in Clinical Neuroscience 10, 2 (2013), 17.
  38. Efficacy of journaling in the management of mental illness: a systematic review and meta-analysis. Family medicine and community health 10, 1 (2022).
  39. Atsuki Tamoto and Katunobu Itou. 2019. Voice authentication by text dependent single utterance for in-car environment. In Proceedings of the 10th International Symposium on Information and Communication Technology. 336–341.
  40. DEPAC: a Corpus for Depression and Anxiety Detection from Speech. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, Ayah Zirikly, Dana Atzil-Slonim, Maria Liakata, Steven Bedrick, Bart Desmet, Molly Ireland, Andrew Lee, Sean MacAvaney, Matthew Purver, Rebecca Resnik, and Andrew Yates (Eds.). Association for Computational Linguistics, Seattle, USA, 1–16. https://doi.org/10.18653/v1/2022.clpsych-1.1
  41. Normative data stratified by age and education for two measures of verbal fluency: FAS and animal naming. Archives of clinical neuropsychology 14, 2 (1999), 167–177.
  42. Linguistic indicators of language in major depressive disorder (MDD). An evidence based research. Journal of Evidence-Based Psychotherapies 17, 1 (2017).
  43. A Survey on Text-Dependent and Text-Independent Speaker Verification. IEEE Access (2022).
  44. Feature Extraction Approach for Speaker Verification to Support Healthcare System Using Blockchain Security for Data Privacy. Computational and Mathematical Methods in Medicine 2022 (2022).
  45. Language agnostic speaker embedding for cross-lingual personalized speech generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 3427–3439.
  46. Semi-supervised classification by reaching consensus among modalities. In NeurIPS Workshop on Interpretability and Robustness in Audio, Speech, and Language IRASL.

Summary

We haven't generated a summary for this paper yet.