Inaudible Voice Commands (1708.07238v1)

Published 24 Aug 2017 in cs.CR

Abstract: Voice assistants like Siri enable us to control IoT devices conveniently with voice commands, however, they also provide new attack opportunities for adversaries. Previous papers attack voice assistants with obfuscated voice commands by leveraging the gap between speech recognition system and human voice perception. The limitation is that these obfuscated commands are audible and thus conspicuous to device owners. In this paper, we propose a novel mechanism to directly attack the microphone used for sensing voice data with inaudible voice commands. We show that the adversary can exploit the microphone's non-linearity and play well-designed inaudible ultrasounds to cause the microphone to record normal voice commands, and thus control the victim device inconspicuously. We demonstrate via end-to-end real-world experiments that our inaudible voice commands can attack an Android phone and an Amazon Echo device with high success rates at a range of 2-3 meters.

PDF Abstract

Inaudible Voice Commands: Exploiting Microphone Non-Linearity

The paper "Inaudible Voice Commands" presents a sophisticated approach to attacking voice assistant systems by leveraging the non-linear properties of microphones to inject covert commands. Through this research, the authors address the vulnerabilities that exist within the increasingly popular Internet of Things (IoT) device ecosystem, predominantly focusing on the voice input channel facilitated by virtual assistants such as Siri, Google Now, and Alexa. Past research highlighted the susceptibility of such systems to audio manipulation; however, previous methods were constrained by the audibility of the commands, thus rendering them detectable by human listeners.

Mechanism of Inaudible Command Injection

The core contribution of this paper lies in its novel method of transmitting inaudible voice commands to control voice-activated systems. Exploiting the inherent non-linearity of microphone components, the method involves generating ultrasound signals that, while imperceptible to humans (i.e., above 20 kHz), are interpreted by the microphone as genuine commands due to intermodulation distortion. This distortion process translates the ultrasound frequencies down into the audible range, ensuring that the targeted microphone captures frequencies that align with those required to activate and command the voice assistant.

The authors' attack algorithm consists of several key steps: low-pass filtering of the normal audio commands, upsampling to accommodate a sufficient range, amplitude modulation to shift the frequencies above the human-detectable threshold, and the strategic addition of a carrier wave to facilitate the non-linear demodulation by the target device's microphone. This sophisticated technique benefits from existing hardware imperfections without the need for physical alteration of the target device.

Empirical Validation and Results

The practical efficacy of the proposed inaudible attack is thoroughly validated through controlled experiments involving an Android smartphone and an Amazon Echo device. The experiments were conducted in a standardized environment, utilizing a commodity audio source and speaker setup to deliver the inaudible commands. The results indicate that the success rate of these attacks is impressively high, with the Android device demonstrating a 100% success rate at a distance of three meters, and the Amazon Echo achieving an 80% success rate at a distance of two meters. These findings underscore the potential for significant real-world impact, particularly in scenarios where proximity access is feasible.

Implications and Future Directions

The implications of this research are profound for both device manufacturers and users of voice-activated systems. The vulnerability exposed poses a significant security risk, as the inconspicuous nature of the injection method allows adversaries to execute unauthorized commands without detection. On a theoretical level, the paper reinforces the need to reevaluate the security frameworks governing IoT devices and the potential exploitation of their sensors.

In light of these findings, future work could focus on devising robust defenses against such non-linear exploitation. Possible approaches may include enhancements in microphone design to mitigate non-linearity effects, the development of anomaly detection algorithms to flag suspicious audio patterns, or integrated security protocols that understand and negate the malicious manipulation of audio frequencies. In addition, exploring the broader applicability of such ultrasonic attacks across other types of sensors may uncover further vulnerabilities within the expanding IoT landscape.

In conclusion, this paper highlights a critical oversight in the current security measures implemented in voice-activated devices. The research underscores the intricate interplay between hardware imperfections and software vulnerabilities, offering a platform for the development of comprehensive security strategies to safeguard against subaudibility-based exploits.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Liwei Song (13 papers)
Prateek Mittal (129 papers)

Citations (66)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos