Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research (2309.08049v2)

Published 14 Sep 2023 in cs.SD and eess.AS

Abstract: Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which improves the quality of the evaluation and reduces their computation time by 65 to 95%, depending on the metric. Our code is fully open source.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. “The VoicePrivacy 2020 challenge: Results and findings,” Computer Speech & Language, vol. 74, pp. 101362, Jul 2022.
  2. “Semi-orthogonal low-rank matrix factorization for deep neural networks.,” in Proc. Interspeech, 2018, pp. 3743–3747.
  3. “The VoicePrivacy 2022 challenge evaluation plan,” arXiv preprint arXiv:2203.12468, 2022.
  4. “The kaldi speech recognition toolkit,” in Proc. IEEE ASRU, Dec. 2011.
  5. “A Comprehensive Evaluation Framework for Speaker Anonymization Systems,” in Proc. 3rd Symposium on Security and Privacy in Speech Communication, 2023, pp. 65–72.
  6. “Voicepm: A robust privacy measurement on voice anonymity,” in Proc. 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec), 2023, p. 215–226.
  7. “ESPnet: End-to-end speech processing toolkit,” in Proc. Interspeech, 2018, pp. 2207–2211.
  8. “SpeechBrain: A general-purpose speech toolkit,” arXiv preprint arXiv:2106.04624, 2021.
  9. “Speaker anonymization with phonetic intermediate representations,” in Proc. Interspeech, 2022, pp. 4925–4929.
  10. “Anonymizing speech with generative adversarial networks to preserve speaker privacy,” in Proc. IEEE SLT. IEEE, 2023, pp. 912–919.
  11. “Prosody is not identity: A speaker anonymization approach using prosody cloning,” in Proc. IEEE ICASSP. IEEE, 2023, pp. 1–5.
  12. “Language-independent speaker anonymization approach using self-supervised pre-trained models,” in Proc. Odyssey, 2022, pp. 279–286.
  13. “Analyzing language-independent speaker anonymization framework under unseen conditions,” in Proc. Interspeech, 2022, pp. 4426–4430.
  14. “Speaker anonymization using orthogonal householder neural network,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 31, pp. 3681–3695, 2023.
  15. “Speaker anonymization by modifying fundamental frequency and x-vector singular value,” Computer Speech & Language, vol. 73, pp. 101326, 2022.
  16. “Are disentangled representations all you need to build speaker anonymization systems?,” in Proc. Interspeech, 2022, pp. 2793–2797.
  17. “Differentially private speaker anonymization,” Proc. Privacy Enhancing Technologies, vol. 2023, no. 1, pp. 98–114, Jan. 2023.
  18. “Generating identities with mixture models for speaker anonymization,” Computer Speech & Language, vol. 72, pp. 101318, 2022.
  19. “NWPU-ASLP system for the voiceprivacy 2022 challenge,” in Proc. 2nd Symp. on Security and Privacy in Speech Communication, 2022.
  20. “A time delay neural network architecture for efficient modeling of long temporal contexts,” in Proc. Interspeech, 2015, pp. 3214–3218.
  21. K. Kasi and S. A. Zahorian, “Yet another algorithm for pitch tracking,” in Proc. IEEE ICASSP, 2002, vol. 1, pp. 361–364.
  22. “X-vectors: Robust dnn embeddings for speaker recognition,” in Proc. IEEE ICASSP. IEEE, 2018, pp. 5329–5333.
  23. “ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification,” in Proc. Interspeech, 2020, pp. 3830–3834.
  24. “Design choices for x-vector based speaker anonymization,” in Proc. Interspeech, 2020, pp. 1713–1717.
  25. “Neural source-filter-based waveform model for statistical parametric speech synthesis,” in Proc. IEEE ICASSP, 2019, pp. 5916–5920.
  26. “HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” in Proc. NeurIPS, 2020, pp. 17022–17033.
  27. “General framework to evaluate unlinkability in biometric template protection systems,” IEEE Trans. Information Forensics and Security, vol. 13, no. 6, pp. 1406–1420, 2018.
  28. “A comparative study of speech anonymization metrics,” in Proc. Interspeech, 2020, pp. 1708–1712.
  29. “Speech pseudonymisation assessment using voice similarity matrices,” in Proc. Interspeech, 2020, pp. 1718–1722.
  30. “The privacy ZEBRA: Zero evidence biometric recognition assessment,” in Proc. Interspeech, 2020, pp. 1698–1702.
  31. “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
  32. “Hybrid CTC/attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240–1253, 2017.
  33. “Librispeech: An ASR corpus based on public domain audio books,” in Proc. IEEE ICASSP, 2015, pp. 5206–5210.
  34. “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92),” 2019.
  35. “Scoring of large-margin embeddings for speaker verification: Cosine or PLDA?,” in Proc. Interspeech, 2022, pp. 600–604.
  36. “How to make embeddings suitable for plda,” Computer Speech & Language, vol. 81, pp. 101523, 2023.
  37. “Speaker diarization: A review of recent research,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 356–370, 2012.
  38. “Vocoder drift in x-vector–based speaker anonymization,” in Proc. Interspeech, 2023, pp. 2863–2867.
  39. “MFA-Conformer: Multi-scale feature aggregation conformer for automatic speaker verification,” in Proc. Interspeech, 2022, pp. 306–310.
  40. “Unispeech-sat: Universal speech representation learning with speaker aware pre-training,” in Proc. IEEE ICASSP. IEEE, 2022, pp. 6152–6156.
Citations (5)

Summary

We haven't generated a summary for this paper yet.