Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space (2405.11982v1)

Published 20 May 2024 in cs.LG and cs.AI

Abstract: Deep reinforcement learning (DRL) algorithms can suffer from modeling errors between the simulation and the real world. Many studies use adversarial learning to generate perturbation during training process to model the discrepancy and improve the robustness of DRL. However, most of these approaches use a fixed parameter to control the intensity of the adversarial perturbation, which can lead to a trade-off between average performance and robustness. In fact, finding the optimal parameter of the perturbation is challenging, as excessive perturbations may destabilize training and compromise agent performance, while insufficient perturbations may not impart enough information to enhance robustness. To keep the training stable while improving robustness, we propose a simple but effective method, namely, Adaptive Adversarial Perturbation (A2P), which can dynamically select appropriate adversarial perturbations for each sample. Specifically, we propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. By designing a metric for the current intensity of the perturbation, our method can calculate the suitable perturbation levels based on the current relative performance. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments. The code is available at https://github.com/Lqm00/A2P-SAC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
  2. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  3. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
  4. S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
  5. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.
  6. B. Sangiovanni, A. Rendiniello, G. P. Incremona, A. Ferrara, and M. Piastra, “Deep reinforcement learning for collision avoidance of robotic manipulators,” in 2018 European Control Conference (ECC).   IEEE, 2018, pp. 2063–2068.
  7. A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” arXiv preprint arXiv:1704.02532, 2017.
  8. S. Aradi, “Survey of deep reinforcement learning for motion planning of autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 740–759, 2020.
  9. Y. Jiang, C. Li, W. Dai, J. Zou, and H. Xiong, “Monotonic robust policy optimization with model discrepancy,” in International Conference on Machine Learning.   PMLR, 2021, pp. 4951–4960.
  10. E. Tzeng, C. Devin, J. Hoffman, C. Finn, X. Peng, S. Levine, K. Saenko, and T. Darrell, “Towards adapting deep visuomotor representations from simulated to real environments,” arXiv preprint arXiv:1511.07111, vol. 2, no. 3, 2015.
  11. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 3803–3810.
  12. G. N. Iyengar, “Robust dynamic programming,” Mathematics of Operations Research, vol. 30, no. 2, pp. 257–280, 2005.
  13. A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, “Epopt: Learning robust neural network policies using model ensembles,” arXiv preprint arXiv:1610.01283, 2016.
  14. L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 2817–2826.
  15. H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 024–21 037, 2020.
  16. C. Tessler, Y. Efroni, and S. Mannor, “Action robust reinforcement learning and applications in continuous control,” in International Conference on Machine Learning.   PMLR, 2019, pp. 6215–6224.
  17. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2017, pp. 23–30.
  18. I. Yang, “A convex optimization approach to distributionally robust markov decision processes with wasserstein distance,” IEEE control systems letters, vol. 1, no. 1, pp. 164–169, 2017.
  19. D. J. Mankowitz, N. Levine, R. Jeong, Y. Shi, J. Kay, A. Abdolmaleki, J. T. Springenberg, T. Mann, T. Hester, and M. Riedmiller, “Robust reinforcement learning for continuous control with model misspecification,” arXiv preprint arXiv:1906.07516, 2019.
  20. M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, and J. Wang, “Wasserstein robust reinforcement learning,” arXiv preprint arXiv:1907.13196, 2019.
  21. G. Kour and R. Saabne, “Real-time segmentation of on-line handwritten arabic script,” in Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on.   IEEE, 2014, pp. 417–422.
  22. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  23. A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in International conference on machine learning.   PMLR, 2018, pp. 274–283.
  24. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  25. A. Mandlekar, Y. Zhu, A. Garg, L. Fei-Fei, and S. Savarese, “Adversarially robust policy learning: Active construction of physically-plausible perturbations,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 3932–3939.
  26. Y. Kuang, M. Lu, J. Wang, Q. Zhou, B. Li, and H. Li, “Learning robust policy against disturbance in transition dynamics via state-conservative policy optimization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7247–7254.
  27. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  28. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2012, pp. 5026–5033.
Citations (1)

Summary

We haven't generated a summary for this paper yet.