Alexa versus Alexa: Controlling Smart Speakers by Self-Issuing Voice Commands (2202.08619v1)

Published 17 Feb 2022 in cs.CR

Abstract: We present Alexa versus Alexa (AvA), a novel attack that leverages audio files containing voice commands and audio reproduction methods in an offensive fashion, to gain control of Amazon Echo devices for a prolonged amount of time. AvA leverages the fact that Alexa running on an Echo device correctly interprets voice commands originated from audio files even when they are played by the device itself -- i.e., it leverages a command self-issue vulnerability. Hence, AvA removes the necessity of having a rogue speaker in proximity of the victim's Echo, a constraint that many attacks share. With AvA, an attacker can self-issue any permissible command to Echo, controlling it on behalf of the legitimate user. We have verified that, via AvA, attackers can control smart appliances within the household, buy unwanted items, tamper linked calendars and eavesdrop on the user. We also discovered two additional Echo vulnerabilities, which we call Full Volume and Break Tag Chain. The Full Volume increases the self-issue command recognition rate, by doubling it on average, hence allowing attackers to perform additional self-issue commands. Break Tag Chain increases the time a skill can run without user interaction, from eight seconds to more than one hour, hence enabling attackers to setup realistic social engineering scenarios. By exploiting these vulnerabilities, the adversary can self-issue commands that are correctly executed 99% of the times and can keep control of the device for a prolonged amount of time. We reported these vulnerabilities to Amazon via their vulnerability research program, who rated them with a Medium severity score. Finally, to assess limitations of AvA on a larger scale, we provide the results of a survey performed on a study group of 18 users, and we show that most of the limitations against AvA are hardly used in practice.

PDF Abstract

Evaluation of AvA: A Novel Attack on Amazon Echo Devices

The paper "Alexa versus Alexa: Controlling Smart Speakers by Self-Issuing Voice Commands" presents the Alexa versus Alexa (AvA) attack, a method to exploit Echo devices' vulnerability through self-issued voice commands. This approach uses audio files containing malicious commands to gain unauthorized control of Amazon's Echo devices without the necessity of a rogue speaker nearby, deviating from the constraints typical of prior attacks in the same space. The research systematically evaluates the attack vectors, impact, and limitations, presenting a comprehensive analysis rooted in substantial empirical testing.

The authors detail how AvA leverages the Echo's capacity to interpret commands from its own audio output, defining "non-exclusivity of the audio channel" as a prerequisite for successful command execution. They investigated three main audio playback methods on Echo devices -- namely, SSML audio tags, Bluetooth audio streaming, and radio stations -- but identified only the latter two as viable vectors for AvA. Bluetooth provides direct control when near the target, whereas the radio station vector allows remote execution, albeit requiring an initial social engineering phase to engage the user.

A significant technical contribution is the identification of the Full Volume Vulnerability (FVV), a mechanism by which the attack improves its reliability by ensuring that command execution occurs at full audio volume, thus improving the success rate. Through extensive experimentation, the authors validated that exploiting FVV yielded up to a 99% command recognition success rate in optimal scenarios, marking a notable efficiency in sustaining control over the Echo device.

Additionally, the research covers post-exploitation actions via a Voice Masquerading Attack (VMA) with the 'Mask Attack' skill, which intercepts user commands and cloaks malicious activities as legitimate interactions. This extension underscores the privacy threats posed by enhancing the attack to simulate user interactions undetected.

The implications for IoT and smart environment security are critical. This vulnerability not only demonstrates a clear flaw in the Echo's handling of self-produced audio streams but also showcases the potential invasiveness of voice-activated systems when combined with adversarial techniques. The research offers mitigation suggestions, emphasizing the need for enhanced detection mechanisms like self-generated wakeword suppression and robust liveness detection.

Despite the strong empirical validation of AvA, the paper identifies practical limitations. Audible playback of TTS-generated commands represents a detectable footprint in proximate environments, potentially alerting users to malicious activity. Moreover, the reliance on TTS voices, although effective, brings recognition variability depending on acoustic settings and device configurations.

Future directions may further investigate adversarial example improvements, enhancement of detection countermeasures, and cross-platform applications—indications for continued research necessary to safeguard voice-controlled IoT devices against sophisticated manipulation attempts. Overall, this paper reflects a significant step in understanding and countering security vulnerabilities in the rapidly integrating context of smart home technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Sergio Esposito (5 papers)
Daniele Sgandurra (8 papers)
Giampaolo Bella (30 papers)

Citations (12)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos