Papers
Topics
Authors
Recent
2000 character limit reached

Assume-Guarantee Reinforcement Learning (2312.09938v1)

Published 15 Dec 2023 in cs.LG, cs.AI, and cs.MA

Abstract: We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making \emph{assumptions} about the behaviors of neighboring components, and providing \emph{guarantees} about their own behavior. We express these \emph{assume-guarantee contracts} as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Rational verification: game-theoretic verification of multi-agent systems. Applied Intelligence, 51(9): 6569–6584.
  2. Scalable planning and learning for multiagent POMDPs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29-1.
  3. Bakule, L. 2008. Decentralized control: An overview. Annual reviews in control, 32(1): 87–98.
  4. Contracts for system design. Now Publishers.
  5. Automated assume-guarantee reasoning by abstraction refinement. In International Conference on Computer Aided Verification, 135–148. Springer.
  6. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374): 418–424.
  7. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1, 183–221.
  8. Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 35(2): 1–53.
  9. What is decidable about partially observable Markov decision processes with omega-regular objectives. Journal of Computer and System Sciences, 82(5): 878–911.
  10. Compositional set invariance in network systems with assume-guarantee contracts. In 2019 American Control Conference (ACC), 1027–1034. IEEE.
  11. Learning assumptions for compositional verification. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 331–346. Springer.
  12. Concurrent reachability games. Theoretical computer science, 386(3): 188–217.
  13. A quantitative approach on assume-guarantee contracts for safety of interconnected systems. In 2019 18th European Control Conference (ECC), 536–541. IEEE.
  14. Fisman, D. 2018. Inferring regular languages and ω𝜔\omegaitalic_ω-languages. Journal of Logical and Algebraic Methods in Programming, 98: 27–49.
  15. Automated assume-guarantee reasoning by abstraction refinement. In Computer Aided Verification: 20th International Conference, CAV 2008 Princeton, NJ, USA, July 7-14, 2008 Proceedings 20, 135–148. Springer.
  16. Multiagent Planning with Factored MDPs. In NIPS, volume 1, 1523–1530.
  17. Multi-Agent Reinforcement Learning with Temporal Logic Specifications. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 583–592.
  18. Dynamic programming for partially observable stochastic games. In AAAI, volume 4, 709–715.
  19. You assume, we guarantee: Methodology and case studies. In Computer Aided Verification: 10th International Conference, CAV’98 Vancouver, BC, Canada, June 28–July 2, 1998 Proceedings 10, 440–451. Springer.
  20. Compositional reinforcement learning from logical specifications. Advances in Neural Information Processing Systems, 34.
  21. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134.
  22. Assume-guarantee verification for probabilistic systems. In Tools and Algorithms for the Construction and Analysis of Systems: 16th International Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings 16, 23–37. Springer.
  23. Formal controller synthesis for continuous-space MDPs via model-free reinforcement learning. In Proc. 11th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 98–107.
  24. Compositional construction of infinite abstractions for networks of stochastic control systems. Automatica, 107: 125–137.
  25. Littman, M. L. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In International Conference on Machine Learning, 157–163.
  26. Efficient dynamic-programming updates in partially observable Markov decision processes.
  27. A generalized reinforcement-learning model: Convergence and applications. In International Conference on Machine Learning, 310–318.
  28. A decentralized energy-optimal control framework for connected automated vehicles at signal-free intersections. Automatica, 93: 244–256.
  29. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1): 1–31.
  30. McMillan, K. L. 1998. Verification of an Implementation of Tomasulo’s Algorithm by Compositional Model Checking. In Computer Aided Verification (CAV’98), 110–121. LNCS 1427.
  31. Compositional abstraction and safety synthesis using overlapping symbolic models. IEEE Transactions on Automatic Control, 63(6): 1835–1841.
  32. Complexity of finite-horizon Markov decision process problems. Journal of the ACM (JACM), 47(4): 681–720.
  33. Pnueli, A. 1985. In transition from global to modular temporal reasoning about programs. In Logics and models of concurrent systems, 123–144. Springer.
  34. SAT-MARL: Specification Aware Training in Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2012.07949.
  35. Siljak, D. D. 2011. Decentralized control of complex systems. Courier Corporation.
  36. Adaptive and Sequential Gridding Procedures for the Abstraction and Verification of Stochastic Processes. SIAM J. Applied Dynamical Systems, 12(2): 921–956.
  37. Reinforcement Learning: An Introduction. MIT Press, second edition.
  38. Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, 330–337.
  39. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350–354.
  40. Watkins, C. J. C. H. 1989. Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge.
  41. A Survey on Multiagent Reinforcement Learning Towards Multi-Robot Systems. In CIG. Citeseer.
  42. Assume-guarantee reasoning framework for MDP-POMDP. In 2016 IEEE 55th Conference on Decision and Control (CDC), 795–800. IEEE.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.