Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations (2312.09950v2)
Abstract: Peer learning is a novel high-level reinforcement learning framework for agents learning in groups. While standard reinforcement learning trains an individual agent in trial-and-error fashion, all on its own, peer learning addresses a related setting in which a group of agents, i.e., peers, learns to master a task simultaneously together from scratch. Peers are allowed to communicate only about their own states and actions recommended by others: "What would you do in my situation?". Our motivation is to study the learning behavior of these agents. We formalize the teacher selection process in the action advice setting as a multi-armed bandit problem and therefore highlight the need for exploration. Eventually, we analyze the learning behavior of the peers and observe their ability to rank the agents' performance within the study group and understand which agents give reliable advice. Further, we compare peer learning with single agent learning and a state-of-the-art action advice baseline. We show that peer learning is able to outperform single-agent learning and the baseline in several challenging discrete and continuous OpenAI Gym domains. Doing so, we also show that within such a framework complex policies from action recommendations beyond discrete action spaces can evolve.
- A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297: 103500.
- A Framework for Behavioural Cloning. In Machine Intelligence 15, 103–129.
- Social learning theory, volume 1. Englewood cliffs Prentice Hall.
- Transition-independent decentralized Markov decision processes. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems, 41–48.
- A conceptual framework for externally-influenced agents: an assisted reinforcement learning review. Journal of Ambient Intelligence and Humanized Computing.
- Social facilitation: A meta-analysis of 241 studies. Psychological bulletin, 94(2): 265.
- OpenAI Gym. arXiv:1606.01540.
- Policy improvement via imitation of multiple oracles. Advances in Neural Information Processing Systems, 33: 5587–5598.
- Poisoning Attack for Inter-agent Transfer Learning. In Security and Privacy in Communication Networks: 17th EAI International Conference, SecureComm 2021, Virtual Event, September 6–9, 2021, Proceedings, Part II 17, 394–404. Springer.
- Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th conference on autonomous agents and multiagent systems, 1100–1108.
- Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 5792–5799.
- Agents teaching agents: a survey on inter-agent transfer learning. Autonomous Agents and Multi-Agent Systems, 34(1): 1–17.
- An Advice Mechanism for Heterogeneous robot Teams. International Journal of Robotics and Automation, 35(1).
- Addressing Function Approximation Error in Actor-Critic Methods. In International conference on machine learning, 1587–1596. PMLR.
- Geen, R. G. 1991. Social motivation. Annual review of psychology, 42(1): 377–399.
- Guerin, B. 1983. Social facilitation and social monitoring: A test of three models. British Journal of Social Psychology, 22(3): 203–214.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
- Action advising with advice imitation in deep reinforcement learning. arXiv:2104.08441.
- The Impact of Study Groups and Roommates on Academic Performance. The Review of Economics and Statistics, 97(1): 44–54.
- Kretchmar, R. M. 2003. Reinforcement learning algorithms for homogenous multi-agent systems. In Workshop on Agent and Swarm Programming, volume 94.
- Minsky, M. 1961. Steps toward artificial intelligence. Proceedings of the IRE, 49(1): 8–30.
- Playing atari with deep reinforcement learning. arXiv:1312.5602.
- Human-level control through deep reinforcement learning. Nature, 518: 529–533.
- Emergent social learning via multi-agent reinforcement learning. In International Conference on Machine Learning, 7991–8004. PMLR.
- Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, volume 1, 663–670.
- On learning by exchanging advice. arXiv:cs/0203010.
- Exchanging advice and learning to trust. In Cooperative Information Agents VII: 7th International Workshop, CIA 2003, Helsinki, Finland, August 27-29, 2003. Proceedings 7, 250–265. Springer.
- Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 6128–6136.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
- Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268): 1–8.
- A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Gordon, G.; Dunson, D.; and Dudík, M., eds., Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, 627–635. Fort Lauderdale, FL, USA: PMLR.
- Rotter, J. B. 1954. Social learning and clinical psychology. Prentice-Hall, Inc.
- Student use of out-of-class study groups in an introductory undergraduate biology course. CBE—Life Sciences Education, 10(1): 74–82.
- Skinner, B. F. 1988. The selection of behavior: The operant behaviorism of BF Skinner: Comments and consequences. CUP Archive.
- Slivkins, A. 2019. Introduction to Multi-Armed Bandits. Foundations and Trends® in Machine Learning, 12(1–2): 1–286.
- Reinforcement learning: An introduction. MIT press.
- Thorndike, E. L. 1898. Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4): i.
- Thorpe, W. H.; et al. 1979. Origins and rise of ethology. Heinemann Educational Books.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026–5033. IEEE.
- Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, 1053–1060.
- Turing, A. M. 1951. Intelligent machinery, a heretical theory. The Turing digital archive (http://www.turingarchive.org), contents of AMT/B/4.
- Learning to make things happen: Infants’ observational learning of social and physical causal events. Journal of Experimental Child Psychology, 162: 58–71.
- Peer Teaching: To Teach Is To Learn Twice. ASHE-ERIC Higher Education Report No. 4, 1988. ERIC.
- Social enhancement and impairment of performance in the cockroach. Journal of Personality and Social Psychology, 13(2): 83.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321–384.
- A survey of multi-agent reinforcement learning with communication. arXiv:2203.08975.