Robust Deep Reinforcement Learning with Adversarial Attacks (1712.03632v1)

Published 11 Dec 2017 in cs.LG, cs.AI, and cs.RO

Abstract: This paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss function which leads to further degradation in performance. These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep Double Q learning and Deep Deterministic Policy Gradients leads to significant increase in robustness to parameter variations for RL benchmarks such as Cart-pole, Mountain Car, Hopper and Half Cheetah environment.

Authors (5)

Anay Pattanaik (2 papers)
Zhenyi Tang (2 papers)
Shuijing Liu (18 papers)
Gautham Bommannan (1 paper)
Girish Chowdhary (69 papers)

Citations (287)

View on Semantic Scholar

Summary

The paper exposes adversarial vulnerabilities in DRL by demonstrating performance degradation under gradient-based and naive sampling attacks.
The authors propose a novel loss function optimizing worst-case Q-values to drive a robust training framework for DRL in complex environments like Cart-Pole and Mountain Car.
A comparative analysis reveals RBF-based Q-learning offers smoother approximations and improved resilience, paving the way for safety-critical autonomous systems.

Robust Deep Reinforcement Learning with Adversarial Attacks

This paper tackles the critical issue of robustness in Deep Reinforcement Learning (DRL), especially under adversarial settings. Adversarial perturbations are a well-studied problem in the context of supervised learning tasks such as image classification, but their impact on reinforcement learning has not been fully explored or mitigated up to now. The authors aim to not only expose vulnerabilities in DRL algorithms through novel adversarial attacks but also utilize these attacks as a training mechanism to enhance robustness and performance in real-world scenarios, such as robotics and autonomous control systems.

Key Contributions

Adversarial Attacks on DRL: The paper demonstrates that DRL algorithms can be severely impacted by adversarial perturbations. The authors articulate two primary methods of engineering these attacks—namely naive sampling and gradient-based attack—and find that these strategies can significantly degrade the performance of well-trained DRL models. This weaker performance is particularly pronounced in state-action environments such as Cart-Pole and Mountain Car, where even minor adversarial perturbations cause notable performance drops.
Enhanced Objective Function for Attack: The authors propose a novel loss function specifically tailored for reinforcement learning contexts, different from traditional methods based on Fast Signed Gradient Method (FSGM). This new function, which utilizes the worst possible action in terms of Q-value, directly aligns with the encompassing philosophy of exposing vulnerabilities in DRL architectures.
Comparative Analysis with RBF Networks: An interesting aspect of the paper is its comparison of DRL with Radial Basis Function (RBF) based Q-learning. It shows RBF methods to be more resilient to adversarial attacks due to their smoother function approximation in state-action space, an insight that could foster development of more robust DRL systems.
Robust Training Framework: Utilizing the adversarial attacks, the paper presents a robust training paradigm that is rooted in the principles of robust control. By training DRL models using adversarially perturbed states, the paper convincingly demonstrates improved robustness across variations in system parameters that were originally adversarial. This robust training enhances both theoretical understanding and practical deployments, where models need to generalize across unseen variations.

Practical and Theoretical Implications

The findings have substantial implications. Practically, integrating adversarial training within DRL can lead to more robust models suited for safety-critical applications, including autonomous vehicles and industrial robots. Moreover, theoretically, the adoption of techniques from robust control into DRL training offers novel perspectives on generalization and robustness, stimulating further research into hybridization of multiple control paradigms in learning algorithms.

The paper paves the way for future developments in AI. Researchers might focus on enhancing DRL architectures with intrinsic resistance against adversarial perturbations, potentially borrowing from both this work and analogous developments in robust optimization fields. There is also potential in exploring new loss functions geared towards more effective adversarial defenses, and expanding the scope of adversarial attacks into continuous control domains more thoroughly.

In summary, this paper contributes significant insights into the vulnerabilities of DRL and proposes viable training interventions to bolster its robustness, carving avenues for future exploration in resilient AI systems.

PDF Markdown

Related Papers

YouTube

Show All Videos