Beyond Neural-on-Neural Approaches to Speaker Gender Protection (2306.17700v1)
Abstract: Recent research has proposed approaches that modify speech to defend against gender inference attacks. The goal of these protection algorithms is to control the availability of information about a speaker's gender, a privacy-sensitive attribute. Currently, the common practice for developing and testing gender protection algorithms is "neural-on-neural", i.e., perturbations are generated and tested with a neural network. In this paper, we propose to go beyond this practice to strengthen the study of gender protection. First, we demonstrate the importance of testing gender inference attacks that are based on speech features historically developed by speech scientists, alongside the conventionally used neural classifiers. Next, we argue that researchers should use speech features to gain insight into how protective modifications change the speech signal. Finally, we point out that gender-protection algorithms should be compared with novel "vocal adversaries", human-executed voice adaptations, in order to improve interpretability and enable before-the-mic protection.
- Y. Gong and C. Poellabauer, “Crafting adversarial examples for speech paralinguistics applications,” DYNAMICS, 2018.
- D. Stoidis and A. Cavallaro, “Generating gender-ambiguous voices for privacy-preserving speech recognition,” INTERSPEECH, pp. 4237–4241, 2022.
- P. Wu, P. P. Liang, J. Shi, R. Salakhutdinov, S. Watanabe, and L.-P. Morency, “Understanding the tradeoffs in client-side privacy for downstream speech tasks,” APSIPA ASC, 2021.
- N. Tomashenko, B. M. L. Srivastava, X. Wang, E. Vincent, A. Nautsch, J. Yamagishi, N. Evans, J. Patino, J.-F. Bonastre, P.-G. Noé et al., “Introducing the voiceprivacy initiative,” INTERSPEECH, pp. 1693–1697, 2020.
- L. Tuncay Zayer and C. A. Coleman, “Advertising professionals’ perceptions of the impact of gender portrayals on men and women: a question of ethics?” Journal of Advertising, vol. 44, no. 3, pp. 1–12, 2015.
- J. Turow, “The voice catchers,” in The Voice Catchers. Yale University Press, 2021.
- M. Lebourdais, M. Tahon, A. Laurent, and S. Meignier, “Overlapped speech and gender detection with wavlm pre-trained features,” INTERSPEECH, pp. 5010–5014, 2022.
- L. Zimman, “Gender diversity and the voice,” The Routledge handbook of language, gender, and sexuality, pp. 69–90, 2021.
- R. Aloufi, H. Haddadi, and D. Boyle, “Privacy-preserving voice analysis via disentangled representations,” in Cloud Computing Security Workshop, 2020.
- P.-G. Noé, M. Mohammadamini, D. Matrouf, T. Parcollet, A. Nautsch, and J.-F. Bonastre, “Adversarial disentanglement of speaker representation for attribute-driven privacy preservation,” INTERSPEECH, pp. 1902–1906, 2021.
- D. Stoidis and A. Cavallaro, “Protecting gender and identity with disentangled speech representations,” INTERSPEECH, pp. 1699–1703, 2021.
- S. Ahmed, Y. Wani, A. S. Shamsabadi, M. Yaghini, I. Shumailov, N. Papernot, and K. Fawaz, “Pipe overflow: Smashing voice authentication for fun and profit,” arXiv:2202.02751, 2022.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an ASR corpus based on public domain audio books,” ICASSP, pp. 5206–5210, 2015.
- J. S. Chung, A. Nagrani, and A. Zisserman, “Voxceleb2: Deep speaker recognition,” INTERSPEECH, pp. 1086–1090, 2018.
- W. Dai, C. Dai, S. Qu, J. Li, and S. Das, “Very deep convolutional neural networks for raw waveforms,” ICASSP, pp. 421–425, 2017.
- S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao et al., “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022.
- T. Bocklet, A. Maier, J. G. Bauer, F. Burkhardt, and E. Noth, “Age and gender recognition for telephone applications based on gmm supervectors and support vector machines,” ICASSP, pp. 1605–1608, 2008.
- P. Boersma and D. Weenik, “Praat: doing phonetics by computer (version 6.2.14),” http://www.praat.org, 2020.
- I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, no. 1, pp. 389–422, 2002.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.
- F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” NeurIPS, pp. 1633–1645, 2020.
- D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” ICML, pp. 173–182, 2016.
- C. Zhang and T. Tan, “Voice disguise and automatic speaker recognition,” Forensic science international, vol. 175, no. 2-3, pp. 118–122, 2008.
- M. Esmaeilpour, P. Cardinal, and A. L. Koerich, “A robust approach for securing audio classification against adversarial attacks,” IEEE Transactions on information forensics and security, vol. 15, pp. 2147–2159, 2019.