Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning (2311.16807v1)

Published 28 Nov 2023 in cs.AI

Abstract: Action advising endeavors to leverage supplementary guidance from expert teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement Learning (DRL). Previous agent-specific action advising methods are hindered by imperfections in the agent itself, while agent-agnostic approaches exhibit limited adaptability to the learning agent. In this study, we propose a novel framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7) to strike a balance between the two. The underlying concept of A7 revolves around utilizing the similarity of state features as an indicator for soliciting advice. However, unlike prior methodologies, the measurement of state feature similarity is performed by neither the error-prone learning agent nor the agent-agnostic advisor. Instead, we employ a proxy model to extract state features that are both discriminative (adaptive to the agent) and generally applicable (robust to agent noise). Furthermore, we utilize behavior cloning to train a model for reusing advice and introduce an intrinsic reward for the advised samples to incentivize the utilization of expert guidance. Experiments are conducted on the GridWorld, LunarLander, and six prominent scenarios from Atari games. The results demonstrate that A7 significantly accelerates the learning process and surpasses existing methods (both agent-specific and agent-agnostic) by a substantial margin. Our code will be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
  2. A. G. Barto, R. S. Sutton, and C. W. Anderson, “Looking back on the actor–critic architecture,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020.
  3. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, 2019.
  4. D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo et al., “Mastering complex control in moba games with deep reinforcement learning,” in AAAI Conference on Artificial Intelligence, 2020.
  5. B. Sangiovanni, A. Rendiniello, G. P. Incremona, A. Ferrara, and M. Piastra, “Deep reinforcement learning for collision avoidance of robotic manipulators,” in European Control Conference, 2018.
  6. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, 2020.
  7. J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in IEEE intelligent transportation systems conference, 2019.
  8. B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, 2021.
  9. K. Zhou, S. Song, A. Xue, K. You, and H. Wu, “Smart train operation algorithms based on expert knowledge and reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020.
  10. L. Yang, Q. Sun, N. Zhang, and Z. Liu, “Optimal energy operation strategy for we-energy of energy internet based on hybrid reinforcement learning with human-in-the-loop,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020.
  11. J. Sharma, P.-A. Andersen, O.-C. Granmo, and M. Goodwin, “Deep q-learning with q-matrix transfer learning for novel fire evacuation environment,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021.
  12. D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample efficiency in model-free reinforcement learning from images,” in AAAI Conference on Artificial Intelligence, 2021.
  13. Z. Ye, Y. Chen, X. Jiang, G. Song, B. Yang, and S. Fan, “Improving sample efficiency in multi-agent actor-critic methods,” Applied Intelligence, 2022.
  14. L. Torrey and M. E. Taylor, “Teaching on a budget: agents advising agents in reinforcement learning,” in International conference on Autonomous Agents and Multi-Agent Systems, 2013.
  15. F. L. D. Silva, P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “Uncertainty-aware action advising for deep reinforcement learning agents,” in AAAI Conference on Artificial Intelligence, 2020.
  16. S. Liu, K. Chen, N. Yu, J. Song, Z. Feng, and M. Song, “Ask-ac: An initiative advisor-in-the-loop actor–critic framework,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–12, 2023.
  17. P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Conference on Neural Information Processing Systems, vol. 30, 2017.
  18. K. Lee, L. Smith, and P. Abbeel, “Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training,” arXiv preprint arXiv:2106.05091, 2021.
  19. OpenAI, “GPT-4 technical report,” arXiv preprint arXiv.2303.08774, 2023.
  20. P. Goyal, S. Niekum, and R. J. Mooney, “Using natural language for reward shaping in reinforcement learning,” arXiv preprint arXiv:1903.02020, 2019.
  21. L. Zhou and K. Small, “Inverse reinforcement learning with natural language goals,” in AAAI Conference on Artificial Intelligence, 2021.
  22. E. İlhan, S. Das, M. E. Taylor et al., “Methodical advice collection and reuse in deep reinforcement learning,” arXiv preprint arXiv:2204.07254, 2022.
  23. E. Ilhan, J. Gow, and D. Perez-Liebana, “Teaching on a budget in multi-agent deep reinforcement learning,” in IEEE Conference on Games, 2019.
  24. E. Ilhan, J. Gow, and D. Perez, “Student-initiated action advising via advice novelty,” IEEE Transactions on Games, 2021.
  25. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Annual Conference on Neural Information Processing Systems, 2020.
  26. S. Arora and P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” Artificial Intelligence, 2021.
  27. K. Lee, L. Smith, A. Dragan, and P. Abbeel, “B-pref: Benchmarking preference-based reinforcement learning,” arXiv preprint arXiv:2111.03026, 2021.
  28. R. Toro Icarte, T. Q. Klassen, R. A. Valenzano, and S. A. McIlraith, “Advice-based exploration in model-based reinforcement learning,” in Canadian Conference on Artificial Intelligence, 2018.
  29. I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep exploration via bootstrapped dqn,” in Conference on Neural Information Processing Systems, 2016.
  30. A. Bignold, F. Cruz, M. E. Taylor, T. Brys, R. Dazeley, P. Vamplew, and C. Foale, “A conceptual framework for externally-influenced agents: An assisted reinforcement learning review,” Journal of Ambient Intelligence and Humanized Computing, 2023.
  31. S. Griffith, K. Subramanian, J. Scholz, C. L. Isbell, and A. L. Thomaz, “Policy shaping: Integrating human feedback with reinforcement learning,” Conference on Neural Information Processing Systems, 2013.
  32. D. Harnack, J. Pivin-Bachler, and N. Navarro-Guerrero, “Quantifying the effect of feedback frequency in interactive reinforcement learning for robotic tasks,” Neural Computing and Applications, 2022.
  33. A. Bignold, F. Cruz, R. Dazeley, P. Vamplew, and C. Foale, “Persistent rule-based interactive reinforcement learning,” Neural Computing and Applications, 2021.
  34. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 770–778.
  35. Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in IEEE/CVF Computer Vision and Pattern Recognition Conference, 2018.
  36. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  37. M. Ye, X. Zhang, P. C. Yuen, and S.-F. Chang, “Unsupervised embedding learning via invariant and spreading instance feature,” in IEEE/CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 6210–6219.
  38. Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” in European Conference on Computer Vision, 2020.
  39. K. He, H. Fan, Y. Wu, S. Xie, and R. B. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  40. T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning, 2020.
  41. M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” Conference on Neural Information Processing Systems, 2020.
  42. Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in International Conference on Machine Learning, 2016.
  43. M. Towers, J. K. Terry, A. Kwiatkowski, J. U. Balis, G. d. Cola, T. Deleu, M. Goulão, A. Kallinteris, A. KG, M. Krimmel, R. Perez-Vicente, A. Pierré, S. Schulhoff, J. J. Tai, A. T. J. Shen, and O. G. Younis, “Gymnasium,” Mar. 2023. [Online]. Available: https://zenodo.org/record/8127025
  44. E. Ilhan, J. Gow, and D. P. Liebana, “Action advising with advice imitation in deep reinforcement learning,” in International Conference on Autonomous Agents and Multiagent Systems, 2021.
  45. Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” in International Conference on Learning Representations, 2019.
  46. H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in AAAI Conference on Artificial Intelligence, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yaoquan Wei (2 papers)
  2. Shunyu Liu (48 papers)
  3. Jie Song (217 papers)
  4. Tongya Zheng (24 papers)
  5. Kaixuan Chen (37 papers)
  6. Yong Wang (498 papers)
  7. Mingli Song (163 papers)

Summary

We haven't generated a summary for this paper yet.