2000 character limit reached
Long-term Conversation Analysis: Exploring Utility and Privacy (2306.16071v1)
Published 28 Jun 2023 in eess.AS, cs.CL, and cs.SD
Abstract: The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.
- M. VanDam, S. E. Ambrose, and M. P. Moeller, “Quantity of Parental Language in the Home Environments of Hard-of-Hearing 2-Year-Olds,” Journal of Deaf Studies and Deaf Education, vol. 17, pp. 402–420, Oct. 2012.
- J. Heritage, “Conversation analysis as social theory,” in The new Blackwell companion to social theory (B. Turner, ed.), pp. 300–320, 2008.
- K. Smeds, F. Wolters, and M. Rung, “Estimation of Signal-to-Noise Ratios in Realistic Sound Scenarios,” Journal of the American Academy of Audiology, vol. 26, pp. 183–196, Feb. 2015.
- P. Segerdahl, “Scientific studies of aspects of everyday life: the example of conversation analysis,” Language & Communication, vol. 18, no. 4, pp. 275–323, 1998.
- J. L. Kröger, O. H.-M. Lutz, and P. Raschke, “Privacy implications of voice and speech analysis: Information disclosure by inference,” in Privacy and Identity Management. Int. Summer School, Windisch, Switzerland, pp. 242–258, Springer International Publishing, 2020.
- European Parliament and Council, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 apr. 2016 on the Protection of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation),” 2016.
- A. Nelus, J. Ebbers, R. Haeb-Umbach, and R. Martin, “Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification,” in Proc. Interspeech, pp. 3710–3714, 2019.
- P. Thaine and G. Penn, “Extracting mel-frequency and bark-frequency cepstral coefficients from encrypted signals,” in Proc. Interspeech, 2019.
- B. M. L. Srivastava, A. Bellet, M. Tommasi, and E. Vincent, “Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?,” in Proc. Interspeech, pp. 3700–3704, 2019.
- M. Hao, H. Li, G. Xu, S. Liu, and H. Yang, “Towards efficient and privacy-preserving federated deep learning,” in ICC 2019 - 2019 IEEE Int. Conf. on Communications, pp. 1–6, 2019.
- D. Wyatt, T. Choudhury, and J. A. Bilmes, “Conversation detection and speaker segmentation in privacy-sensitive situated speech data.,” in Proc. Interspeech, pp. 586–589, 2007.
- J. Bitzer, S. Kissner, and I. Holube, “Privacy-aware acoustic assessments of everyday life,” Journal of the Audio Engineering Society, vol. 64, no. 6, pp. 395–404, 2016.
- J. Patino, N. Tomashenko, M. Todisco, A. Nautsch, and N. Evans, “Speaker Anonymisation Using the McAdams Coefficient,” in Proc. Interspeech, pp. 1099–1103, ISCA, 2021.
- M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, C. Subakan, N. Dawalatabad, A. Heba, J. Zhong, J.-C. Chou, S.-L. Yeh, S.-W. Fu, C.-F. Liao, E. Rastorgueva, F. Grondin, W. Aris, H. Na, Y. Gao, R. D. Mori, and Y. Bengio, “SpeechBrain: A general-purpose speech toolkit,” 2021. arXiv:2106.04624.
- U. Kowalk, S. Franz, H. Groenewold, I. Holube, P. v. Gablenz, and J. Bitzer, “olMEGA: An open source android solution for ecological momentary assessment,” GMS Zeitschrift für Audiologie - Audiological Acoustics, vol. 2, pp. 1–9, 2020.
- A. Nelus, S. Gergen, J. Taghia, and R. Martin, “Towards opaque audio features for privacy in acoustic sensor networks,” in Speech Communication; 12. ITG Symposium, pp. 1–5, VDE, 2016.
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd Int. Conf. on Machine learning, pp. 369–376, 2006.
- N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivastava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O’Brien, A. Chanclu, J.-F. Bonastre, M. Todisco, and M. Maouche, “The VoicePrivacy 2020 Challenge: Results and findings,” Computer, Speech and Language, vol. 74, 2022.
- B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification,” in Proc. Interspeech, ISCA, oct 2020.
- N. Dawalatabad, M. Ravanelli, F. Grondin, J. Thienpondt, B. Desplanques, and H. Na, “ECAPA-TDNN embeddings for speaker diarization,” in Proc. Interspeech, pp. 3560–3564, 2021.
- J. G. Fiscus, J. Ajot, M. Michel, and J. S. Garofolo, “The rich transcription 2006 spring meeting recognition evaluation,” in Machine Learning for Multimodal Interaction (S. Renals, S. Bengio, and J. G. Fiscus, eds.), (Berlin, Heidelberg), pp. 309–322, Springer Berlin Heidelberg, 2006.
- F. Landini, J. Profant, M. Diez, and L. Burget, “Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks,” Computer Speech & Language, vol. 71, 2022.
- Y. Nakajima, M. Matsuda, K. Ueda, and G. B. Remijn, “Temporal Resolution Needed for Auditory Communication: Measurement With Mosaic Speech,” Frontiers in Human Neuroscience, vol. 12, 2018.
- F. Adolfi, J. S. Bowers, and D. Poeppel, “Successes and critical failures of neural networks in capturing human-like speech recognition,” Neural Networks, vol. 162, pp. 199–211, 2023.
- N. Tomashenko, X. Wang, X. Miao, H. Nourtel, P. Champion, M. Todisco, E. Vincent, N. Evans, J. Yamagishi, and J.-F. Bonastre, “The Voice Privacy 2022 Challenge Evaluation Plan,” 2022.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in Proc. of ICASSP, pp. 5206–5210, 2015.
- J. S. Chung, A. Nagrani, and A. Zisserman, “VoxCeleb2: Deep Speaker Recognition,” in Proc. Interspeech, 2018.
- J. Carletta, “Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus,” Language Resources and Evaluation, vol. 41, no. 2, pp. 181–190, 2007.
- T. Cord-Landwehr, T. von Neumann, C. Boeddeker, and R. Haeb-Umbach, “MMS-MSG: A Multi-Purpose Multi-Speaker Mixture Signal Generator,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5, IEEE, 2022.
- M. Marzinzik and B. Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,” IEEE transactions on speech and audio processing, vol. 10, no. 2, pp. 109–118, 2002.