Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning (2405.16854v1)

Published 27 May 2024 in cs.MA

Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While researchers have designed a variety of effective algorithms to compress the action space, these methods also introduce new challenges, such as the need for manually designed prior knowledge or reliance on the structure of the problem, which diminishes the applicability of these techniques. In this paper, we introduce Evolutionary action SPAce Reduction with Knowledge (eSpark), an exploration function generation framework driven by LLMs to boost exploration and prune unnecessary actions in MARL. Using just a basic prompt that outlines the overall task and setting, eSpark is capable of generating exploration functions in a zero-shot manner, identifying and pruning redundant or irrelevant state-action pairs, and then achieving autonomous improvement from policy feedback. In reinforcement learning tasks involving inventory management and traffic light control encompassing a total of 15 scenarios, eSpark consistently outperforms the combined MARL algorithm in all scenarios, achieving an average performance gain of 34.4% and 9.9% in the two types of tasks respectively. Additionally, eSpark has proven to be capable of managing situations with a large number of agents, securing a 29.7% improvement in scalability challenges that featured over 500 agents. The code can be found in https://github.com/LiuZhihao2022/eSpark.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Playing text-adventure games with graph-based deep reinforcement learning. arXiv preprint arXiv:1812.01628, 2018.
  3. Optimal inventory policy. Econometrica: Journal of the Econometric Society, pages 250–272, 1951.
  4. Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, pages 397–422, 2002.
  5. Sumo–simulation of urban mobility: an overview. In SIMUL, 2011.
  6. Alan S Blinder. Inventory theory and consumer behavior. Harvester Wheatsheaf, 1990.
  7. Exploration by random network distillation. In ICLR, pages 1–17, 2018.
  8. Actrce: Augmenting experience via teacher’s advice for multi-goal reinforcement learning. arXiv preprint arXiv:1902.04546, 2019.
  9. Learning to generate better than your llm. arXiv preprint arXiv:2306.11816, 2023.
  10. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control. In AAAI, pages 3414–3421, 2020.
  11. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023.
  12. Scaling multi-agent reinforcement learning with selective parameter sharing. In ICML, pages 1989–1998, 2021.
  13. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
  14. Guiding pretraining in reinforcement learning with large language models. In ICML, pages 8657–8677, 2023.
  15. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
  16. What can you do with a rock? affordance extraction via word embeddings. arXiv preprint arXiv:1703.03429, 2017.
  17. Is curiosity all you need? on the utility of emergent behaviours from curious exploration. arXiv preprint arXiv:2109.08603, 2021.
  18. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML, pages 1861–1870, 2018.
  19. Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks. In ICLR, 2022.
  20. Breaking the curse of dimensionality in multiagent state space: A unified agent permutation framework. arXiv preprint arXiv:2203.05285, 2022.
  21. Human instruction-following with deep reinforcement learning via transfer-learning from text. arXiv preprint arXiv:2005.09382, 2020.
  22. Language instructed reinforcement learning for human-ai coordination. In ICML, pages 13584–13598, 2023.
  23. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In ICML, pages 9118–9147, 2022.
  24. Inner monologue: Embodied reasoning through planning with language models. In CoRL, pages 1769–1782, 2023.
  25. A general scenario-agnostic reinforcement learning for traffic signal control. IEEE Transactions on Intelligent Transportation Systems, 2024.
  26. Maximum pressure controller for stabilizing queues in signalized arterial networks. Transportation Research Record, 2421(1):133–141, 2014.
  27. Reward design with language models. In ICLR, 2023.
  28. Transfer reinforcement learning via meta-knowledge extraction using auto-pruned decision trees. Knowledge-Based Systems, 242:108221, 2022.
  29. Verco: Learning coordinated verbal communication for multi-agent reinforcement learning. arXiv preprint arXiv:2404.17780, 2024.
  30. From explicit communication to tacit cooperation: A novel paradigm for cooperative marl. arXiv preprint arXiv:2304.14656, 2023.
  31. Code as policies: Language model programs for embodied control. In ICRA, pages 9493–9500, 2023.
  32. Combating reinforcement learning’s sisyphean curse with intrinsic fear. arXiv preprint arXiv:1611.01211, 2016.
  33. An example of evolutionary computation+ large language model beating human: Design of efficient guided local search. arXiv preprint arXiv:2401.02051, 2024.
  34. Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS, 2017.
  35. Dualight: Enhancing traffic signal control by leveraging scenario-specific and scenario-shared knowledge. arXiv preprint arXiv:2312.14532, 2023.
  36. Eureka: Human-level reward design via coding large language models. In ICLR, pages 1–39, 2024.
  37. Patrick Meier. Digital humanitarians: how big data is changing the face of humanitarian response. Crc Press, 2015.
  38. Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intelligent Transport Systems, 11(7):417–423, 2017.
  39. Multi-agent meta-reinforcement learning for self-powered and sustainable edge computing systems. IEEE Transactions on Network and Service Management, 18(3):3353–3374, 2021.
  40. Marco - multi-agent reinforcement learning based control of building hvac systems. In ACM e-Energy, 2020.
  41. Gpt-in-the-loop: Adaptive decision-making for multiagent systems. arXiv preprint arXiv:2308.10435, 2023.
  42. Llms for science: Usage for code generation and data analysis. arXiv preprint arXiv:2311.16733, 2023.
  43. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008.
  44. Falcon-farm level control for wind turbines using multi-agent deep reinforcement learning. Renewable Energy, 181:445–456, 2022.
  45. Curiosity-driven exploration by self-supervised prediction. In ICML, pages 2778–2787, 2017.
  46. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 21(1):7234–7284, 2020.
  47. Traffic engineering. Pearson/Prentice Hall, 2004.
  48. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024.
  49. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  50. Skill induction and planning with latent language. In ACL, pages 1713–1726, 2022.
  51. Pruning the way to reliable policies: A multi-objective deep q-learning approach to critical care. arXiv preprint arXiv:2306.08044, 2023.
  52. Herbert A Simon. Rational choice and the structure of the environment. Psychological review, 63(2):129, 1956.
  53. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pages 5887–5896, 2019.
  54. Exploration from demonstration for interactive reinforcement learning. In AAMAS, pages 447–456, 2016.
  55. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  56. QPLEX: Duplex dueling multi-agent Q-learning. In ICLR, pages 1–27, 2021.
  57. Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039, 2020.
  58. Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought. In NeurIPS, 2024.
  59. Colight: Learning network-level cooperation for traffic signal control. In CIKM, pages 1913–1922, 2019.
  60. A versatile multi-agent reinforcement learning benchmark for inventory management. arXiv preprint arXiv:2306.07542, 2023.
  61. React: Synergizing reasoning and acting in language models. In ICLR, pages 1–33, 2023.
  62. Reevo: Large language models as hyper-heuristics with reflective evolution. arXiv preprint arXiv:2402.01145, 2024.
  63. Scalable primal-dual actor-critic method for safe multi-agent RL with general utilities. In NeurIPS, 2023.
  64. The surprising effectiveness of ppo in cooperative multi-agent games. In NeurIPS, pages 24611–24624, 2022.
  65. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
  66. Learn what not to learn: Action elimination with deep reinforcement learning. In NeurIPS, 2018.
  67. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. arXiv preprint arXiv:2311.13884, 2023.
  68. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pages 321–384, 2021.
  69. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
  70. Alessandro Zocca. Temporal starvation in multi-channel csma networks: an analytical framework. ACM SIGMETRICS Performance Evaluation Review, 46(3):52–53, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets