Papers
Topics
Authors
Recent
2000 character limit reached

Bounded Robustness in Reinforcement Learning via Lexicographic Objectives (2209.15320v2)

Published 30 Sep 2022 in cs.LG, cs.AI, and cs.RO

Abstract: Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Wasserstein robust reinforcement learning. arXiv preprint arXiv:1907.13196, 2019.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Mach. Learn. Res., 22(98):1–76, 2021.
  3. Vulnerability of deep reinforcement learning to policy induction attacks. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 262–275. Springer, 2017.
  4. A. Ben-Israel and B. Mond. What is invexity? The Journal of the Australian Mathematical Society. Series B. Applied Mathematics, 28(1):1–9, 1986a.
  5. What is invexity? The ANZIAM Journal, 28(1):1–9, 1986b.
  6. Dimitri Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
  7. Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
  8. Global optimality guarantees for policy gradient methods. arXiv preprint arXiv:1906.01786, 2019.
  9. Vivek S. Borkar. Stochastic Approximation. Hindustan Book Agency, 2008.
  10. V.S. Borkar and K. Soumyanatha. An analog scheme for fixed point computation. i. theory. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 44(4):351–355, 1997. 10.1109/81.563625.
  11. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  12. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
  13. Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
  14. Adversarial policies: Attacking deep reinforcement learning. In International Conference on Learning Representations, 2020.
  15. Morgan A Hanson. On sufficiency of the kuhn-tucker conditions. Journal of Mathematical Analysis and Applications, 80(2):545–550, 1981.
  16. Matthias Heger. Consideration of risk in reinforcement learning. In Machine Learning Proceedings 1994, pages 105–111. Elsevier, 1994.
  17. Matrix analysis. Cambridge university press, 2012.
  18. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
  19. H Isermann. Linear lexicographic optimization. Operations-Research-Spektrum, 4(4):223–228, 1982.
  20. Robust temporal difference learning for critical domains. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, page 350–358, Richland, SC, 2019. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450363099.
  21. Ezgi Korkmaz. Adversarial robust deep reinforcement learning requires redefining robustness. arXiv preprint arXiv:2301.07487, 2023.
  22. Delving into adversarial attacks on deep policies. arXiv preprint arXiv:1705.06452, 2017.
  23. Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning. Advances in Neural Information Processing Systems, 35:22547–22561, 2022.
  24. Tactics of adversarial attack on deep reinforcement learning agents. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 3756–3762, 2017.
  25. Adversarially robust policy learning: Active construction of physically-plausible perturbations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3932–3939. IEEE, 2017.
  26. Robust reinforcement learning: A review of foundations and recent advances. Machine Learning and Knowledge Extraction, 4(1):276–315, 2022.
  27. Policy invariance under reward transformations: Theory and application to reward shaping. In Proc. of the Sixteenth International Conference on Machine Learning, 1999, 1999.
  28. Risk averse robust adversarial reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 8522–8528. IEEE, 2019.
  29. Constrained reinforcement learning has zero duality gap. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 7553–7563, 2019.
  30. Robust deep reinforcement learning with adversarial attacks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 2040–2042, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
  31. Safe policy iteration. In International Conference on Machine Learning, pages 307–315. PMLR, 2013.
  32. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  33. A theory of lexicographic multi-criteria optimization. In Proceedings of ICECCS’96: 2nd IEEE International Conference on Engineering of Complex Computer Systems (held jointly with 6th CSESAW and 4th IEEE RTAW), pages 76–79. IEEE, 1996.
  34. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  35. Invariance in policy optimisation and partial identifiability in reward learning. arXiv preprint arXiv:2203.07475, 2022a.
  36. Lexicographic multi-objective reinforcement learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, pages 3430–3436, jul 2022b. 10.24963/ijcai.2022/476.
  37. Morton Slater. Lagrange multipliers revisited. Cowles Commission Discussion Paper No. 403, 1950.
  38. Matthijs TJ Spaan. Partially observable markov decision processes. In Reinforcement Learning, pages 387–414. Springer, 2012.
  39. Matthijs TJ Spaan and N Vlassis. A point-based pomdp algorithm for robot planning. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 3, pages 2399–2404. IEEE, 2004.
  40. Reinforcement learning: An introduction. MIT press, 2018.
  41. Robustifying reinforcement learning agents via action space adversarial training. In 2020 American control conference (ACC), pages 3959–3964. IEEE, 2020.
  42. Action robust reinforcement learning and applications in continuous control. In International Conference on Machine Learning, pages 6215–6224. PMLR, 2019.
  43. Plannable approximations to mdp homomorphisms: Equivariance under actions. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’20, page 1431–1439, Richland, SC, 2020. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450375184.
  44. Distributionally robust control of constrained stochastic systems. IEEE Transactions on Automatic Control, 61(2):430–442, 2015.
  45. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
  46. The robustness-performance tradeoff in markov decision processes. Advances in Neural Information Processing Systems, 19, 2006.
  47. Robust deep reinforcement learning against adversarial perturbations on state observations. Advances in Neural Information Processing Systems, 33:21024–21037, 2020.
  48. Robust reinforcement learning on state observations with learned optimal adversary. In International Conference on Learning Representation (ICLR), 2021.
  49. Shangtong Zhang. Modularized implementation of deep rl algorithms in pytorch. https://github.com/ShangtongZhang/DeepRL, 2018.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.