Papers
Topics
Authors
Recent
Search
2000 character limit reached

Enhancing Reinforcement Learning Agents with Local Guides

Published 21 Feb 2024 in cs.LG, cs.SY, and eess.SY | (2402.13930v1)

Abstract: This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towards better actions. We evaluated our method on a set of classical Reinforcement Learning problems, including safety-critical systems where the agent cannot enter some areas at the risk of triggering catastrophic consequences. In all the proposed environments, our agent proved to be efficient at leveraging those policies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning stages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning. 1.
  2. Constrained policy optimization. In International conference on machine learning. PMLR, 22–31.
  3. Beyond Tabula Rasa: Reincarnating Reinforcement Learning. arXiv preprint arXiv:2206.01626 (2022).
  4. Gain-scheduling LPV control for autonomous vehicles including friction force estimation and compensation mechanism. IET Control Theory & Applications 12, 12 (2018), 1683–1693.
  5. Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence 297 (2021), 103500.
  6. Playing hard exploration games by watching youtube. Advances in neural information processing systems 31 (2018).
  7. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems 29 (2016).
  8. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature 588, 7836 (2020), 77–82.
  9. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47 (2013), 253–279.
  10. Ju-Seung Byun and Andrew Perrault. 2021. Training Transition Policies via Distribution Matching for Complex Tasks. arXiv preprint arXiv:2110.04357 (2021).
  11. Imitation learning from pixel-level demonstrations by hashreward. arXiv preprint arXiv:1909.03773 (2019).
  12. Heuristic-guided reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 13550–13563.
  13. Safe Policy Learning for Continuous Control. In Conference on Robot Learning. PMLR, 801–821.
  14. Exploring the Limitations of Behavior Cloning for Autonomous Driving. In ICCV. IEEE, 9328–9337.
  15. Pedro Henrique Silva Coutinho and Reinaldo Martínez Palhares. 2021. Dynamic periodic event-triggered gain-scheduling control co-design for quasi-LPV systems. Nonlinear Analysis: Hybrid Systems 41 (2021), 101044.
  16. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 7897 (2022), 414–419.
  17. Marco Dorigo and Marco Colombetti. 1994. Robot shaping: Developing autonomous agents through learning. Artificial intelligence 71, 2 (1994), 321–370.
  18. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 9 (2021), 2419–2468.
  19. Benjamin Ellenberger. 2018–2019. PyBullet Gymperium. https://github.com/benelot/pybullet-gym.
  20. Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robotics and Automation Letters 6, 3 (2021), 4257–4264.
  21. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
  22. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
  23. Gain-scheduling model predictive control of a Fresnel collector field. Control Engineering Practice 82 (2019), 1–13.
  24. Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437–1480.
  25. A theory of regularized markov decision processes. In International conference on machine learning. PMLR, 2160–2169.
  26. Combining deep reinforcement learning and local control for the acrobot swing-up and balance task. In 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 4129–4134.
  27. Sven Gronauer. 2022. Bullet-Safety-Gym: a framework for constrained Reinforcement Learning. (2022).
  28. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
  29. Expressing Arbitrary Reward Functions as Potential-Based Advice. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 2652–2658.
  30. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  31. Deep q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  32. Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems 29 (2016).
  33. Imitation Learning: A Survey of Learning Methods. ACM Comput. Surv. 50, 2 (2017), 21:1–21:35.
  34. Lazy-MDPs: Towards Interpretable RL by Learning When to Act. In International Conference on Autonomous Agents and Multiagent Systems. 669–677.
  35. Is Q-learning provably efficient? Advances in neural information processing systems 31 (2018).
  36. Policy optimization with demonstrations. In International conference on machine learning. PMLR, 2469–2478.
  37. David Grigorovich Khachaturov. 2021. Susceptibility of Hierarchical Reinforcement Learning to Adversarial Examples Computer Science Tripos–Part II Churchill College. (2021).
  38. Hassan K Khalil. 2002. Nonlinear systems; 3rd ed. Prentice-Hall, Upper Saddle River, NJ. https://cds.cern.ch/record/1173048 The book can be consulted by contacting: PH-AID: Wallet, Lionel.
  39. W Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning.. In International Conference on Autonomous Agents and Multiagent Systems. 5–12.
  40. W Bradley Knox and Peter Stone. 2011. Augmenting reinforcement learning with human feedback. In ICML 2011 Workshop on New Developments in Imitation Learning (July 2011), Vol. 855. 3.
  41. W Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward.. In International Conference on Autonomous Agents and Multiagent Systems. 475–482.
  42. Jens Kober and Jan Peters. 2008. Policy search for motor primitives in robotics. Advances in neural information processing systems 21 (2008).
  43. Offline reinforcement learning with fisher divergence critic regularization. In International conference on machine learning. PMLR, 5774–5783.
  44. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
  45. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  46. Nonlinear optimal control via occupation measures and LMI-relaxations. SIAM journal on control and optimization 47, 4 (2008), 1643–1666.
  47. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble. In Conference on Robot Learning. PMLR, 1702–1712.
  48. Composing complex skills by learning transition policies. In International Conference on Learning Representations.
  49. Douglas J Leith and William E Leithead. 2000. Survey of gain-scheduling analysis and design. International journal of control 73, 11 (2000), 1001–1025.
  50. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).
  51. Optimal control. John Wiley & Sons.
  52. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
  53. Mildly conservative Q-learning for offline reinforcement learning. arXiv preprint arXiv:2206.04745 (2022).
  54. Maja J Mataric. 1994. Reward functions for accelerated learning. In Machine learning proceedings 1994. Elsevier, 181–189.
  55. Constrained model predictive control: Stability and optimality. Automatica 36, 6 (2000), 789–814.
  56. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.
  57. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
  58. Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems 31 (2018).
  59. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
  60. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 6292–6299.
  61. Policy invariance under reward transformations: Theory and application to reward shaping. In International conference on machine learning, Vol. 99. 278–287.
  62. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In International conference on machine learning. Morgan Kaufmann, 278–287.
  63. Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with policy gradients. Neural networks 21, 4 (2008), 682–697.
  64. Observe and look further: Achieving consistent performance on atari. arXiv preprint arXiv:1805.11593 (2018).
  65. Dean A Pomerleau. 1988. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems 1 (1988).
  66. Jette Randløv and Preben Alstrøm. 1998. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping.. In International conference on machine learning, Vol. 98. Citeseer, 463–471.
  67. Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems 3 (2020), 297–330.
  68. Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration. In International Conference on Learning Representations).
  69. New recommendation system using reinforcement learning. Special Issue of the Intl. J. Computer, the Internet and Management 13, SP 3 (2005).
  70. Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In International conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 661–668.
  71. A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 627–635.
  72. Wilson J Rugh and Jeff S Shamma. 2000. Research on gain scheduling. Automatica 36, 10 (2000), 1401–1425.
  73. Learning to Fly. In ML. Morgan Kaufmann, 385–393.
  74. Stefan Schaal. 1996. Learning from demonstration. Advances in neural information processing systems 9 (1996).
  75. Kickstarting deep reinforcement learning. arXiv preprint arXiv:1803.03835 (2018).
  76. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  77. Cog: Connecting new skills to past experience with offline reinforcement learning. arXiv preprint arXiv:2010.14500 (2020).
  78. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning - an introduction. MIT Press.
  79. Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018).
  80. Towards robust data-driven control synthesis for nonlinear systems with actuation uncertainty. In 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 6469–6476.
  81. Distral: Robust multitask reinforcement learning. Advances in neural information processing systems 30 (2017).
  82. Emanuel Todorov. 2006. Optimal control theory. Bayesian brain: probabilistic approaches to neural coding (2006), 268–298.
  83. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026–5033.
  84. Safe reinforcement learning via curriculum induction. Advances in Neural Information Processing Systems 33 (2020), 12151–12162.
  85. Jump-Start Reinforcement Learning. arXiv preprint arXiv:2204.02372 (2022).
  86. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817 (2017).
  87. Safe reinforcement learning using advantage-based intervention. In International conference on machine learning. PMLR, 10630–10640.
  88. Zhe Wang and Tianzhen Hong. 2020. Reinforcement learning for building controls: The opportunities and challenges. Applied Energy 269 (2020), 115036.
  89. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
  90. Wasserstein adversarial imitation learning. arXiv preprint arXiv:1906.08113 (2019).
  91. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In WWW. ACM, 167–176.
  92. Teacher-student framework: a reinforcement learning approach. In AAMAS Workshop Autonomous Robots and Multirobot Systems.
  93. Reinforcement learning policies with local LQR guarantees for nonlinear discrete-time systems. In 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2258–2263.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.