The paper "Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding" introduces a novel approach for voice anonymization aimed at enhancing privacy. The technique focuses on obscuring the speaker's original voice attributes, making them unrecognizable to both machine recognition systems and, to a lesser extent, human listeners. This technique is particularly vital in scenarios where the speaker's identity should be protected from automated systems without losing intelligibility for human listeners.
Key Contributions:
- Asynchronous Voice Anonymization: The authors propose a method termed "asynchronous voice anonymization," which targets the alteration of voice attributes primarily against machine recognition systems while preserving the quality of human perception.
- Speaker Disentanglement: The framework uses a speaker disentanglement mechanism. This mechanism helps isolate speaker-specific attributes from the speech signal, making it easier to modify these attributes without affecting other aspects of the speech.
- Adversarial Perturbation: A critical aspect of the method is the use of adversarial perturbations applied directly on speaker embeddings. This approach strategically alters the speaker's unique voice features, masking them effectively from recognition algorithms.
- Control of Perturbation Intensity: To ensure that human perception remains largely unaffected, the intensity of the applied perturbations is carefully controlled. This balance ensures that while machine systems find it difficult to identify the original speaker, human listeners can still understand the speech content adequately.
- Experimental Validation: The methodology was tested on the LibriSpeech dataset. It was validated that speaker attributes were successfully obscured while human perception was preserved in approximately 60.71% of the processed utterances, indicating a significant level of effectiveness and practical applicability.
Overall, this paper contributes to the growing research in privacy-preserving technologies for voice data, crucial for applications in digital communication, voice assistants, and other domains where speech data is sensitive. The combination of adversarial techniques with a focus on maintaining human listener intelligibility provides a promising direction for future developments.