GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems (2404.01131v2)
Abstract: For multi-agent reinforcement learning systems (MARLS), the problem formulation generally involves investing massive reward engineering effort specific to a given problem. However, this effort often cannot be translated to other problems; worse, it gets wasted when system dynamics change drastically. This problem is further exacerbated in sparse reward scenarios, where a meaningful heuristic can assist in the policy convergence task. We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in MARLS during its learning stage. We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions. During the agent learning stage, it iteratively explores different reward distribution configurations with a Hyperband-like algorithm to learn ideal agent reward models in a problem-agnostic manner. Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.
- Agent57: Outperforming the atari human benchmark. In International conference on machine learning, 507–517. PMLR.
- Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038.
- Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528.
- Dynamic noncooperative game theory. SIAM.
- Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29.
- Benny, L. B. 1922. Plane Geometry: an Account of the More Elementary Properties of the Conic Sections: Treated by the Methods of Coordinate Geometry and of Modern Projective Geometry, with Applications to Practical Drawing. Blackie and son.
- Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).
- Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2): e1484.
- Towards self-configuration in autonomic electronic institutions. In International Workshop on Coordination, Organizations, Institutions, and Norms in Agent Systems, 229–244. Springer.
- Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence, volume 17, 1021–1026. Citeseer.
- Reinforcement learning from demonstration through shaping. In Twenty-fourth international joint conference on artificial intelligence.
- Policy Transfer using Reward Shaping. In AAMAS, 181–188.
- Multi-objectivization of reinforcement learning problems by reward shaping. In 2014 international joint conference on neural networks (IJCNN), 2315–2322. IEEE.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894.
- Chalkiadakis, G. 2003. Multiagent reinforcement learning: Stochastic games with multiple learning players. Dept. of Computer Science, University of Toronto, Canada, Tech. Rep, 25.
- Shared experience actor-critic for multi-agent reinforcement learning. Advances in neural information processing systems, 33: 10707–10717.
- Agents and norms: How to fill the gap? AI & L., 7: 1.
- Sample complexity of episodic fixed-horizon reinforcement learning. Advances in Neural Information Processing Systems, 28.
- Theoretical considerations of potential-based reward shaping for multi-agent systems. In The 10th International Conference on Autonomous Agents and Multiagent Systems, 225–232. ACM.
- Dynamic potential-based reward shaping. In Proceedings of the 11th international conference on autonomous agents and multiagent systems, 433–440. IFAAMAS.
- A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artificial Intelligence Review, 54: 3215–3238.
- Duvenaud, D. 2014. Automatic model construction with Gaussian processes. Ph.D. thesis, University of Cambridge.
- Ltlf-based reward shaping for reinforcement learning. In Adaptive and Learning Agents Workshop, volume 2021.
- Eschmann, J. 2021. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, 25–33.
- On the formal specification of electronic institutions. In Agent Mediated Electronic Commerce: The European AgentLink Perspective, 126–147. Springer.
- Implicit curves and surfaces: mathematics, data structures and algorithms. Springer.
- Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 1–49.
- Expressing arbitrary reward functions as potential-based advice. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
- Automated machine learning: methods, systems, challenges. Springer Nature.
- Actor-attention-critic for multi-agent reinforcement learning. In International conference on machine learning, 2961–2970. PMLR.
- Learning attentional communication for multi-agent cooperation. Advances in neural information processing systems, 31.
- Open problem: The dependence of sample complexity lower bounds on planning horizon. In Conference On Learning Theory, 3395–3398. PMLR.
- MARL-Based Dual Reward Model on Segmented Actions for Multiple Mobile Robots in Automated Warehouse Environment. Applied Sciences, 12(9): 4703.
- Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037.
- RLlib: Abstractions for Distributed Reinforcement Learning. In International Conference on Machine Learning (ICML).
- Evolutionary population curriculum for scaling multi-agent reinforcement learning. arXiv preprint arXiv:2003.10423.
- Neufeld, E. A. 2022. Reinforcement Learning Guided by Provable Normative Compliance. In Rocha, A. P.; Steels, L.; and van den Herik, H. J., eds., Proceedings of the 14th International Conference on Agents and Artificial Intelligence, ICAART 2022, Volume 3, Online Streaming, February 3-5, 2022, 444–453. SCITEPRESS.
- Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, 278–287.
- Self-learning governance of black-box multi-agent systems. In International Workshop on Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems, 73–91. Springer.
- Count-based exploration with neural density models. In International conference on machine learning, 2721–2730. PMLR.
- Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268): 1–8.
- Schaal, S. 1996. Learning from demonstration. Advances in neural information processing systems, 9.
- Multiagent learning: Basics, challenges, and prospects. Ai Magazine, 33(3): 41–41.
- Engineering Environment-Mediated Multi-Agent Systems: International Workshop, EEMMAS 2007, Dresden, Germany, October 5, 2007, Selected Revised and Invited Papers, volume 5049. Springer.
- A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136): 1–46.
- A low-cost ethics shaping approach for designing reinforcement learning agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2201.04612.
- Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the workshop on machine learning in high-performance computing environments, 1–5.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321–384.
- Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE symposium series on computational intelligence (SSCI), 737–744. IEEE.