Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 22 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 163 tok/s Pro
2000 character limit reached

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning (2312.05783v1)

Published 10 Dec 2023 in cs.LG

Abstract: Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. In-time explainability in multi-agent systems: Challenges, opportunities, and roadmap. In EXTRAAMAS, pp.  39–53. Springer, 2020.
  2. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern., 13:834–846, 1983.
  3. Beran, R. Minimum hellinger distance estimates for parametric models. The annals of Statistics, pp.  445–463, 1977.
  4. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  5. Deep multi agent reinforcement learning for autonomous driving. In Canadian Conference on Artificial Intelligence, pp.  67–78. Springer, 2020.
  6. Superhuman ai for multiplayer poker. Science, 365:885–890, 2019.
  7. Multi-agent reinforcement learning: An overview. Innovations in Multi-Agent Systems and Application – 1, pp.  183–221, 2010.
  8. Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11:4948, 2021.
  9. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 9:427–438, 2012.
  10. On the utility of learning about humans for human-ai coordination. In NeurIPS, pp.  5175–5186, 2019.
  11. Generating diverse cooperative agents by learning incompatible policies. In ICLR, 2023.
  12. Chen, G. A new framework for multi-agent reinforcement learning–centralized training and exploration with decentralized execution via policy distillation. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp.  1801–1803, 2020.
  13. Learning active camera for multi-object navigation. In NeurIPS, pp.  28670–28682, 2022.
  14. Multi-agent reinforcement learning for autonomous vehicles: a survey. Autonomous Intelligent Systems, 2:27, 2022.
  15. LIIR: learning individual intrinsic reward in multi-agent reinforcement learning. In NeurIPS, pp.  4405–4416, 2019.
  16. Learning to communicate with deep multi-agent reinforcement learning. In NeurIPS, pp.  2137–2145, 2016.
  17. Counterfactual multi-agent policy gradients. In AAAI, pp.  2974–2982, 2018.
  18. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML, pp.  1856–1865, 2018.
  19. Actor-attention-critic for multi-agent reinforcement learning. In ICML, pp.  2961–2970, 2019a.
  20. Coordinated exploration via intrinsic rewards for multi-agent reinforcement learning. arXiv preprint arXiv:1905.12127, 2019b.
  21. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364:859–865, 2019.
  22. Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32:1238–1274, 2013.
  23. Google research football: A novel reinforcement learning environment. In AAAI, pp.  4501–4510, 2020.
  24. Langley, P. Crafting papers on machine learning. In ICML, pp.  1207–1216, 2000.
  25. Celebrating diversity in shared multi-agent reinforcement learning. pp.  3991–4002, 2021.
  26. Aiir-mix: Multi-agent reinforcement learning meets attention individual intrinsic reward mixing network. In ACML, pp.  579–594, 2023.
  27. Rllib: Abstractions for distributed reinforcement learning. In ICML, pp.  3059–3068, 2018.
  28. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  29. Tizero: Mastering multi-agent football with curriculum learning and self-play. arXiv, 2023.
  30. Pic: permutation invariant critic for multi-agent deep reinforcement learning. In CoRL, pp.  590–602, 2020.
  31. Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS, pp.  6379–6390, 2017.
  32. ELIGN: expectation alignment as a multi-agent intrinsic reward. In NeurIPS, pp.  8304–8317, 2022.
  33. Multi-agent systems for power engineering applications—part ii: Technologies, standards, and tools for building multi-agent systems. IEEE Transactions on Power Systems, 22:1753–1759, 2007.
  34. The jensen-shannon divergence. Journal of the Franklin Institute, 334(2):307–318, 1997.
  35. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, pp.  278–287, 1999.
  36. A concise introduction to decentralized POMDPs. Springer, 2016.
  37. Facmac: Factored multi-agent centralised policy gradients. In NeurIPS, pp.  12208–12221, 2021.
  38. Rmix: Learning risk-sensitive policies for cooperative reinforcement learning agents. In NeurIPS, pp.  23049–23062, 2021.
  39. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 21:7234–7284, 2020.
  40. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  41. Trust region policy optimization. In ICML, pp.  1889–1897, 2015.
  42. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  43. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
  44. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp.  2085–2087, 2018.
  45. Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
  46. Policy gradient methods for reinforcement learning with function approximation. 1999.
  47. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2:20, 2019.
  48. Too many cooks: Bayesian inference for coordinating multi-agent collaboration. In Muggleton, S. H. and Chater, N. (eds.), Human-Like Machine Intelligence, pp.  152–170. 2022.
  49. Tianshou: A highly modularized deep reinforcement learning library. The Journal of Machine Learning Research, 23(1):12275–12280, 2022.
  50. Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pp.  5887–5896, 2019.
  51. Generating individual intrinsic reward for cooperative multiagent reinforcement learning. International Journal of Advanced Robotic Systems, 18:17298814211044946, 2021.
  52. Stigmergic independent reinforcement learning for multiagent collaboration. IEEE Transactions on Neural Networks and Learning Systems, 33:4285–4299, 2021.
  53. The surprising effectiveness of ppo in cooperative multi-agent games. pp.  24611–24624, 2022.
  54. Independent reinforcement learning for weakly cooperative multiagent traffic control problem. IEEE Transactions on Vehicular Technology, 70:7426–7436, 2021a.
  55. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pp.  321–384, 2021b.
  56. On learning intrinsic rewards for policy gradient methods. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), NeurIPS, pp.  4649–4659, 2018.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube