Prosody-Driven Privacy-Preserving Dementia Detection (2407.03470v1)
Abstract: Speaker embeddings extracted from voice recordings have been proven valuable for dementia detection. However, by their nature, these embeddings contain identifiable information which raises privacy concerns. In this work, we aim to anonymize embeddings while preserving the diagnostic utility for dementia detection. Previous studies rely on adversarial learning and models trained on the target attribute and struggle in limited-resource settings. We propose a novel approach that leverages domain knowledge to disentangle prosody features relevant to dementia from speaker embeddings without relying on a dementia classifier. Our experiments show the effectiveness of our approach in preserving speaker privacy (speaker recognition F1-score .01%) while maintaining high dementia detection score F1-score of 74% on the ADReSS dataset. Our results are also on par with a more constrained classifier-dependent system on ADReSSo (.01% and .66%), and have no impact on synthesized speech naturalness.
- S. de la Fuente Garcia, F. Haider, and S. Luz, “Cross-corpus feature learning between spontaneous monologue and dialogue for automatic classification of alzheimer’s dementia speech,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2020, pp. 5851–5855.
- E. Casanova, A. Candido Jr, R. C. F. Junior, M. Finger, L. R. S. Gris, M. A. Ponti, and D. P. P. Da Silva, “Transfer learning and data augmentation techniques to the covid-19 identification tasks in compare 2021.” in Interspeech, 2021, pp. 446–450.
- C. Botelho, T. Schultz, A. Abad, and I. Trancoso, “Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.” 2022.
- eur lex.europa.eu, “Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation).” [Online]. Available: https://tinyurl.com/6tk3j9aw
- commission.europa.eu, “What personal data is considered sensitive?” [Online]. Available: https://tinyurl.com/2fmu22j6
- hhs.gov, “The hipaa privacy rule,” accessed online on March 2024. [Online]. Available: https://tinyurl.com/yhbpsame
- N. Tomashenko, B. M. L. Srivastava, X. Wang, E. Vincent, A. Nautsch, J. Yamagishi, N. Evans, J. Patino, J.-F. Bonastre, P.-G. Noé et al., “Introducing the voiceprivacy initiative,” in INTERSPEECH 2020, 2020.
- ISO/IEC, “Iso/iec 24745: Information security, cybersecurity and privacy protection — biometric information protection.” May 2021.
- B. M. L. Srivastava, N. Vauquier, M. Sahidullah, A. Bellet, M. Tommasi, and E. Vincent, “Evaluating voice conversion-based privacy protection against informed attackers,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 2802–2806.
- P.-G. Noé, M. Mohammadamini, D. Matrouf, T. Parcollet, A. Nautsch, and J.-F. Bonastre, “Adversarial disentanglement of speaker representation for attribute-driven privacy preservation,” arXiv preprint arXiv:2012.04454, 2020.
- O. Chouchane, M. Panariello, O. Zari, I. Kerenciler, I. Chihaoui, M. Todisco, and M. Önen, “Differentially private adversarial auto-encoder to protect gender in voice biometrics,” in Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security, 2023, pp. 127–132.
- F. Teixeira, A. Abad, B. Raj, and I. Trancoso, “Privacy-oriented manipulation of speaker representations,” arXiv preprint arXiv:2310.06652, 2023.
- P.-G. Noé, A. Nautsch, D. Matrouf, P.-M. Bousquet, and J.-F. Bonastre, “A bridge between features and evidence for binary attribute-driven perfect privacy,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 3094–3098.
- C. Luu, S. Renals, and P. Bell, “Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations,” in Interspeech 2022. ISCA, 2022, pp. 610–614.
- R. Aloufi, H. Haddadi, and D. Boyle, “Privacy-preserving voice analysis via disentangled representations,” in Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, ser. CCSW’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1–14.
- C. Lavania, S. Das, X. Huang, and K. Han, “Utility-preserving privacy-enabled speech embeddings for emotion detection,” 2023.
- M. Tran and M. Soleymani, “Privacy-preserving Representation Learning for Speech Understanding,” in Proc. INTERSPEECH 2023, 2023, pp. 2858–2862.
- K. Qian, Y. Zhang, S. Chang, M. Hasegawa-Johnson, and D. Cox, “Unsupervised speech decomposition via triple information bottleneck,” in International Conference on Machine Learning. PMLR, 2020, pp. 7836–7846.
- V. Vincze, G. Szatlóczki, L. Tóth, G. Gosztolya, M. Pákáski, I. Hoffmann, and J. Kálmán, “Telltale silence: temporal speech parameters discriminate between prodromal dementia and mild alzheimer’s disease,” Clinical Linguistics & Phonetics, vol. 35, no. 8, pp. 727–742, 2021.
- P. Pastoriza-Dominguez, I. G. Torre, F. Dieguez-Vide, I. Gómez-Ruiz, S. Geladó, J. Bello-López, A. Ávila-Rivera, J. A. Matias-Guiu, V. Pytel, and A. Hernández-Fernández, “Speech pause distribution as an early marker for alzheimer’s disease,” Speech Communication, vol. 136, pp. 107–117, 2022.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016.
- B. C. Ross, “Mutual information between discrete and continuous data sets,” PloS one, vol. 9, no. 2, p. e87357, 2014.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015, pp. 5206–5210.
- S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, “Alzheimer’s dementia recognition through spontaneous speech: The adress challenge,” arXiv preprint arXiv:2004.06833, 2020.
- J. T. Becker, F. Boiler, O. L. Lopez, J. Saxton, and K. L. McGonigle, “The natural history of alzheimer’s disease: description of study cohort and accuracy of diagnosis,” Archives of neurology, vol. 51, no. 6, pp. 585–594, 1994.
- D. R. Feinberg, “Parselmouth praat scripts in python,” Jan 2022. [Online]. Available: osf.io/6dwr3
- B. Desplanques, J. Thienpondt, and K. Demuynck, “Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” arXiv preprint arXiv:2005.07143, 2020.
- M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, C. Subakan, N. Dawalatabad, A. Heba, J. Zhong, J.-C. Chou, S.-L. Yeh, S.-W. Fu, C.-F. Liao, E. Rastorgueva, F. Grondin, W. Aris, H. Na, Y. Gao, R. D. Mori, and Y. Bengio, “SpeechBrain: A general-purpose speech toolkit,” 2021, arXiv:2106.04624.
- T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011.
- A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning. PMLR, 2023, pp. 28 492–28 518.
- E. Casanova, J. Weber, C. D. Shulby, A. C. Junior, E. Gölge, and M. A. Ponti, “Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone,” in International Conference on Machine Learning. PMLR, 2022, pp. 2709–2720.
- J. Ao, R. Wang, L. Zhou, C. Wang, S. Ren, Y. Wu, S. Liu, T. Ko, Q. Li, Y. Zhang, Z. Wei, Y. Qian, J. Li, and F. Wei, “SpeechT5: Unified-modal encoder-decoder pre-training for spoken language processing,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 5723–5738.
- S. Meyer, F. Lux, P. Denisov, J. Koch, P. Tilli, and N. T. Vu, “Speaker Anonymization with Phonetic Intermediate Representations,” in Proc. Interspeech 2022, 2022, pp. 4925–4929.