Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding (2406.08200v3)

Published 12 Jun 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances.

Authors (4)

Rui Wang (996 papers)
Liping Chen (21 papers)
Kong Aik Lee (77 papers)
Zhen-Hua Ling (114 papers)

Summary

The paper "Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding" introduces a novel approach for voice anonymization aimed at enhancing privacy. The technique focuses on obscuring the speaker's original voice attributes, making them unrecognizable to both machine recognition systems and, to a lesser extent, human listeners. This technique is particularly vital in scenarios where the speaker's identity should be protected from automated systems without losing intelligibility for human listeners.

Key Contributions:

Asynchronous Voice Anonymization: The authors propose a method termed "asynchronous voice anonymization," which targets the alteration of voice attributes primarily against machine recognition systems while preserving the quality of human perception.
Speaker Disentanglement: The framework uses a speaker disentanglement mechanism. This mechanism helps isolate speaker-specific attributes from the speech signal, making it easier to modify these attributes without affecting other aspects of the speech.
Adversarial Perturbation: A critical aspect of the method is the use of adversarial perturbations applied directly on speaker embeddings. This approach strategically alters the speaker's unique voice features, masking them effectively from recognition algorithms.
Control of Perturbation Intensity: To ensure that human perception remains largely unaffected, the intensity of the applied perturbations is carefully controlled. This balance ensures that while machine systems find it difficult to identify the original speaker, human listeners can still understand the speech content adequately.
Experimental Validation: The methodology was tested on the LibriSpeech dataset. It was validated that speaker attributes were successfully obscured while human perception was preserved in approximately 60.71% of the processed utterances, indicating a significant level of effectiveness and practical applicability.

Overall, this paper contributes to the growing research in privacy-preserving technologies for voice data, crucial for applications in digital communication, voice assistants, and other domains where speech data is sensitive. The combination of adversarial techniques with a focus on maintaining human listener intelligibility provides a promising direction for future developments.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/ArxivSound/status/1801430200191725860

https://twitter.com/AudioAndSpeech/status/1801404223201968455