Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks (2403.00344v2)

Published 1 Mar 2024 in cs.RO and cs.LG

Abstract: Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. C. A. Okoro, N. D. Hollis, A. C. Cyrus, and S. Griffin-Blake, “Prevalence of disabilities and health care access by disability status and type among adults – united states, 2016,” Morbidity and Mortality Weekly Report (MMWR), vol. 67, pp. 882–887, 2018.
  2. T. L. Chen, M. Ciocarlie, S. Cousins, P. M. Grice, K. Hawkins, K. Hsiao, C. C. Kemp, C.-H. King, D. A. Lazewatsky, A. E. Leeper, H. Nguyen, A. Paepcke, C. Pantofaru, W. D. Smart, and L. Takayama, “Robots for humanity: using assistive robotics to empower people with disabilities,” IEEE Robotics & Automation Magazine, vol. 20, no. 1, pp. 30–39, 2013.
  3. S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016.
  4. A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2019.
  5. J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4–5, pp. 698–721, 2021.
  6. Z. Erickson, V. Gangaram, A. Kapusta, C. K. Liu, and C. C. Kemp, “Assistive gym: A physics simulation framework for assistive robotics,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2020.
  7. A. Clegg, Z. Erickson, P. Grady, G. Turk, C. C. Kemp, and C. K. Liu, “Learning to collaborate from simulation for robot-assisted dressing,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2746–2753, 2020.
  8. Y. Li, J. Song, and S. Ermon, “InfoGAIL: Interpretable imitation learning fromvisual demonstrations,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
  9. J. Z.-Y. He, A. Raghunathan, D. S. Brown, Z. Erickson, and A. D. Dragan, “Learning representations that enable generalization in assistive tasks,” in Proceedings of Conference on Robot Learning (CoRL), 2022.
  10. F. Zhang, A. Cully, and Y. Demiris, “Probabilistic real-time user posture tracking for personalized robot-assisted dressing,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 873–888, 2019.
  11. T. Rhodes and M. Veloso, “Robot-driven trajectory improvementfor feeding tasks,” in Proceedings of the IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS), 2018.
  12. D. Park, Y. Hoshi, and C. C. Kemp, “A multimodal anomaly detectorfor robot-assisted feeding using an lstm-based variational autoencoder,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1544–1551, 2018.
  13. D. Gallenberger, T. Bhattacharjee, Y. Kim, and S. S. Srinivasa, “Transferdepends on acquisition: Analyzing manipulation strategies for robotic feeding,” in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019.
  14. T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, and J. Peters, “An algorithmic perspective on imitation learning,” Foundations and Trends®in Robotics, vol. 7, no. 1-2, pp. 1–179, 2018.
  15. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, 2020.
  16. B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–18, 2021.
  17. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of go without human knowledge,” Nature, 2017.
  18. T. Osa, V. Tangkaratt, and M. Sugiyama, “Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information,” Neural Networks, vol. 152, pp. 90–104, 2022.
  19. M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning.” in Proceedings of the International Conference on Machine Learning (ICML), 1994.
  20. R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
  21. J. Ackermann, V. Gabler, T. Osa, and M. Sugiyama, “Reducing overestimation bias in multi-agent domains using double centralized critics,” in the Deep Reinforcement Learning Workshop at NeurIPS 2019, 2019.
  22. B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” in Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  23. S. Kumar, A. Kumar, S. Levine, and C. Finn, “One solution is not all you need:few-shot extrapolation via structured maxent rl,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.
  24. A. Sharma, S. Gu, S. Levine, V. Kumar, and K. Hausman, “Dynamics-aware unsupervised discovery of skills,” in Proceedings of the International Conference on Learning Representations (ICLR), 2020.
  25. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” in arXiv, 2017.
  26. C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperativemulti-agent games,” in ArXiv, 2021.
  27. C. Bodnar, A. Li, K. Hausman, P. Pastor, and M. Kalakrishnan, “Quantile qt-opt for risk-awarevision-based robotic grasping,” in Robotics and Science and Systems, 2020.
  28. E. Coumans and Y. Bai, “PyBullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  29. S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proceedings of the International Conference on Machine Learning (ICML), vol. 80, 2018, pp. 1587–1596.
  30. J. Achiam, “Spinning Up in Deep Reinforcement Learning,” 2018.
  31. A. Bukharin, Y. Li, Y. Yu, Q. Zhang, Z. Chen, S. Zuo, C. Zhang, S. Zhang, and T. Zhao, “Robust multi-agent reinforcement learning viaadversarial regularization: Theoretical foundationand stable algorithms,” in Advances in Neural Information Processing Systems, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.