Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition (1903.10346v2)

Published 22 Mar 2019 in eess.AS, cs.LG, cs.SD, and stat.ML

Abstract: Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. So far, adversarial examples have been studied most extensively in the image domain. In this domain, adversarial examples can be constructed by imperceptibly modifying images to cause misclassification, and are practical in the physical world. In contrast, current targeted adversarial examples applied to speech recognition systems have neither of these properties: humans can easily identify the adversarial perturbations, and they are not effective when played over-the-air. This paper makes advances on both of these fronts. First, we develop effectively imperceptible audio adversarial examples (verified through a human study) by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets. Next, we make progress towards physical-world over-the-air audio adversarial examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions.

PDF Abstract

Overview of "Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition"

The paper "Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition" primarily addresses the construction of adversarial examples that target Automatic Speech Recognition (ASR) systems. Unlike previous efforts that demonstrated major shortcomings, particularly in terms of perceptibility and lack of over-the-air robustness, this research makes significant strides toward overcoming these limitations.

Key Contributions

Imperceptibility via Auditory Masking: The authors harness the psychoacoustic principle of auditory masking to create adversarial audio samples. By targeting the frequencies that coincide with human auditory masking, they ensure that perturbations remain imperceptible to human listeners. Employing this strategy, the adversarial examples achieved a 100% targeted attack success rate while maintaining near-indistinguishable quality compared to clean audio in controlled studies.
Simulated Over-the-Air Robustness: The research progresses toward robust over-the-air adversarial examples by employing room impulse response simulations. These simulated environments replicate real-world conditions that audio signals encounter, enhancing the adversarial examples' resilience without prior knowledge of specific room configurations. While the attacks did not work in physical over-the-air settings, they significantly improved through simulated evaluation, showing potential for further development.
Application to a Deep Neural Network ASR System: The approach was tested against the Lingvo ASR system, a state-of-the-art end-to-end neural network architecture. This advancement represents a key achievement as many prior attacks targeted older speech recognition systems or smaller, less complex tasks.

Implications and Future Research Directions

This paper holds important implications both theoretically and practically in the context of adversarial machine learning. The primary theoretical implication lies in demonstrating that it is possible to leverage domain-specific knowledge, such as psychoacoustic principles, to guide adversarial example constructions beyond mere optimization over traditional distance metrics. This sets the stage for further exploration into domain-informed adversarial attacks across different data modalities.

Practically, these advancements potentially impact the security measures surrounding ASR systems, which are increasingly integrated into commercial and critical applications. While this paper focuses on theoretical contributions, it opens avenues for exploring the defensive measures against such targeted imperceptible attacks, as current adversarial defenses are typically designed around perceptible perturbations.

For future work, addressing the challenge of developing fully imperceptible over-the-air adversarial examples remains a critical task. Investigating the integration of real-time environmental feedback during adversarial example generation may bridge the gap between simulated success and physical-world application. Additionally, expanding this line of work to include different languages, environmental settings, and vocal characteristics would further validate the generalizability and robustness of the proposed methodology.

In conclusion, the paper presents notable technical progress in crafting adversarial examples that are both imperceptible and robust within certain constraints. The insights offered are a valuable addition to both the adversarial and ASR literature, guiding future explorations that are likely to further the understanding and capabilities in this domain.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yao Qin (41 papers)
Nicholas Carlini (101 papers)
Ian Goodfellow (54 papers)
Garrison Cottrell (11 papers)
Colin Raffel (83 papers)

Citations (367)

View on Semantic Scholar

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition (1903.10346v2)

Overview of "Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition"

Key Contributions

Implications and Future Research Directions

Related Papers