Defending Your Voice: Adversarial Attack on Voice Conversion (2005.08781v3)

Published 18 May 2020 in eess.AS, cs.LG, and cs.SD

Abstract: Substantial improvements have been achieved in recent years in voice conversion, which converts the speaker characteristics of an utterance into those of another speaker without changing the linguistic content of the utterance. Nonetheless, the improved conversion technologies also led to concerns about privacy and authentication. It thus becomes highly desired to be able to prevent one's voice from being improperly utilized with such voice conversion technologies. This is why we report in this paper the first known attempt to perform adversarial attack on voice conversion. We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended. Given these adversarial examples, voice conversion models cannot convert other utterances so as to sound like being produced by the defended speaker. Preliminary experiments were conducted on two currently state-of-the-art zero-shot voice conversion models. Objective and subjective evaluation results in both white-box and black-box scenarios are reported. It was shown that the speaker characteristics of the converted utterances were made obviously different from those of the defended speaker, while the adversarial examples of the defended speaker are not distinguishable from the authentic utterances.

Citations (45)

View on Semantic Scholar

Summary

The paper proposes a novel method that integrates imperceptible adversarial noise to prevent unauthorized voice conversion.
It evaluates the approach in both white-box and black-box scenarios using state-of-the-art zero-shot voice conversion models.
Experiments demonstrate that the method successfully alters speaker identity while maintaining natural speech quality.

The paper "Defending Your Voice: Adversarial Attack on Voice Conversion" addresses the burgeoning concerns around privacy and authentication in the context of voice conversion technologies. Voice conversion, which alters the speaker characteristics of an utterance while maintaining the original linguistic content, has advanced significantly. However, these advancements pose risks related to the unauthorized use of one's voice.

The authors present a novel approach to protect individuals' voices from misuse by introducing human-imperceptible noise into their utterances. This adversarial noise is crafted to impede voice conversion models, preventing them from making other utterances sound as if they were produced by the protected speaker.

Key features of this research include:

Adversarial Noise: The paper focuses on integrating carefully designed adversarial noise into speech, which remains undetectable to human listeners. This noise disrupts the ability of voice conversion models to replicate the speaker's voice.
White-Box and Black-Box Scenarios: The research evaluates the effectiveness of this adversarial approach in both white-box (where the model architecture and parameters are known) and black-box (where the model is treated as a closed system) settings.
Experimental Evaluation: The authors conduct experiments on two leading zero-shot voice conversion models. Both objective measures and subjective listening tests show that the method successfully alters the speaker characteristics in converted utterances, making them distinct from the original speaker, while keeping the adversarially modified utterances indistinguishable from genuine ones.

This work represents a foundational step towards defending against unauthorized voice conversion, showcasing the potential of adversarial techniques in safeguarding personal voice data.

PDF Markdown

Defending Your Voice: Adversarial Attack on Voice Conversion (2005.08781v3)

Summary

Related Papers