Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation (1804.06311v2)

Published 17 Apr 2018 in cs.MA

Abstract: Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent Value function Approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Scalable planning and learning for multiagent pomdps. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 1995–2002. AAAI Press, 2015.
  2. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems, pages 5366–5376, 2017.
  3. Thompson sampling based monte-carlo planning in pomdps. In Proceedings of the Twenty-Fourth International Conferenc on International Conference on Automated Planning and Scheduling, pages 29–37. AAAI Press, 2014.
  4. Richard Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, USA, 1 edition, 1957.
  5. Scalable multiagent coordination with distributed online open loop planning. arXiv preprint arXiv:1702.07544, 2017.
  6. Stacked thompson bandits. In Proceedings of the 3rd International Workshop on Software Engineering for Smart Cyber-Physical Systems, pages 18–21. IEEE Press, 2017.
  7. Onplan: A framework for simulation-based online planning. In International Workshop on Formal Aspects of Component Software, pages 1–30. Springer, 2015.
  8. Craig Boutilier. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195–210. Morgan Kaufmann Publishers Inc., 1996.
  9. S Bubeck and R Munos. Open loop optimistic planning. In Conference on Learning Theory, 2010.
  10. Multi-agent reinforcement learning: An overview. In Innovations in multi-agent systems and applications-1, pages 183–221. Springer, 2010.
  11. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995: Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, July 9-12 1995, page 362. Morgan Kaufmann, 2016.
  12. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249–2257, 2011.
  13. Guillaume Chaslot. Monte-carlo tree search. Maastricht: Universiteit Maastricht, 2010.
  14. Decentralised online planning for multi-robot warehouse commissioning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 492–500. International Foundation for Autonomous Agents and Multiagent Systems, 2017.
  15. Fast and accurate deep network learning by exponential linear units (elus). CoRR, abs/1511.07289, 2015.
  16. Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 1, pages 136–143. IEEE Computer Society, 2004.
  17. Jonathan St BT Evans. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468, 1984.
  18. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.
  19. Texplore: real-time sample-efficient reinforcement learning for robots. Machine learning, 90(3):385–429, 2013.
  20. Optimality of thompson sampling for gaussian bandits depends on priors. In Artificial Intelligence and Statistics, pages 375–383, 2014.
  21. Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, 1961.
  22. Daniel Kahneman. Maps of bounded rationality: Psychology for behavioral economics. The American economic review, 93(5):1449–1475, 2003.
  23. Bandit based monte-carlo planning. In ECML, volume 6, pages 282–293. Springer, 2006.
  24. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.
  25. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  26. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.
  27. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015.
  28. A concise introduction to decentralized POMDPs. Springer, 2016.
  29. The cross-entropy method for policy search in decentralized pomdps. Informatica, 32(4), 2008.
  30. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289–353, 2008.
  31. Hybrid pomdp algorithms. In Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM-06), pages 133–147, 2006.
  32. Open loop search for general video game playing. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pages 337–344. ACM, 2015.
  33. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  34. Aems: An anytime online search algorithm for approximate policy refinement in large pomdps. In IJCAI, pages 2592–2598, 2007.
  35. Monte-carlo planning in large pomdps. In Advances in neural information processing systems, pages 2164–2172, 2010.
  36. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
  37. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
  38. Scaling up optimal heuristic search in dec-pomdps via incremental expansion. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2027–2032. AAAI Press, 2011.
  39. Introduction to reinforcement learning, volume 135. MIT Press Cambridge, 1998.
  40. Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
  41. Maa*: A heuristic search algorithm for solving decentralized pomdps. In 21st Conference on Uncertainty in Artificial Intelligence-UAI’2005, 2005.
  42. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
  43. Ming Tan. Multi-agent reinforcement learning: independent vs. cooperative agents. In Readings in agents, pages 487–494. Morgan Kaufmann Publishers Inc., 1997.
  44. William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294, 1933.
  45. Open-loop planning in large-scale stochastic domains. In AAAI, 2013.
  46. Multi-agent online planning with communication. In Nineteenth International Conference on Automated Planning and Scheduling, 2009.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.