Tactics of Adversarial Attack on Deep Reinforcement Learning Agents (1703.06748v4)

Published 8 Mar 2017 in cs.LG, cs.CR, and stat.ML

Abstract: We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/

PDF Abstract

Overview of "Tactics of Adversarial Attack on Deep Reinforcement Learning Agents"

This paper provides an empirical investigation into the tactics for adversarial attacks on deep reinforcement learning (DRL) agents. Authored by Lin et al., the paper explores the vulnerabilities inherent in DRL systems, particularly focusing on the impact and delivery of adversarial examples. The authors propose two specific attack strategies: the strategically-timed attack and the enchanting attack, which effectively disrupt the decision-making processes of DRL agents by introducing carefully crafted deviations at critical junctures or over extended action sequences.

Strategically-Timed Attack

The strategically-timed attack aims to minimize the reward gained by a DRL agent, which is a proxy for its performance quality. Unlike naive uniform attacks where adversarial perturbations are applied constantly throughout an episode, this method disrupts the agent only at selected time steps where such interruptions would most effectively lower the agent’s performance. By preserving the stealthiness of the attack, it also minimizes the probability of detection. The strategic deployment can render the same detriment to the agent’s accumulated reward while operating up to four times less frequently than uniform attacks. This efficacy is highlighted in five Atari games, wherein targeted adversarial attacks, impacting only about 25% of time steps, cause equivalent degradation to agents’ rewards as would a constant attack.

Enchanting Attack

In contrast to undermining performance, the enchanting attack proactively lures agents toward potentially harmful states by leveraging state predictions. It combines generative models for future states with a planning algorithm to chart a sequence of adversarial actions. The enchanting attack achieved more than a 70% success rate in misguiding agents toward predetermined target states within a 40-step horizon. This method essentially manipulates the policy of the DRL agent to reach adversarially specified end states. The authors employ a crafted sequence of adversarial examples designed around a generative prediction model that forecasts future states to execute this novel attack strategy.

Implications and Future Directions

The startling success of these attacks raises critical questions about the robustness and reliability of DRL systems, especially those deployed in high-stakes scenarios. The results of these experiments underscore the urgency in developing robust countermeasures that can safeguard DRL agents from such adversarial exploits. While current defenses remain underexplored in DRL, examining their potential is a notable avenue for subsequent research.

The authors also suggest that future work should aim to refine strategically-timed attacks, possibly by enhancing prediction accuracy via more sophisticated generative models. The paper implies that understanding and enhancing the robustness of DRL systems against adversarial contingencies will likely require developing both improved adversarial detection mechanisms and more resilient DRL models. This research lays foundational insights into adversarial threats to DRL systems, mandating a broader, coordinated effort to fortify artificial intelligence applications against subversive interventions.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yen-Chen Lin (13 papers)
Zhang-Wei Hong (31 papers)
Yuan-Hong Liao (9 papers)
Meng-Li Shih (5 papers)
Ming-Yu Liu (87 papers)
Min Sun (108 papers)

Citations (393)

View on Semantic Scholar

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents (1703.06748v4)

Overview of "Tactics of Adversarial Attack on Deep Reinforcement Learning Agents"

Strategically-Timed Attack

Enchanting Attack

Implications and Future Directions

Related Papers