Assume-Guarantee Reinforcement Learning (2312.09938v1)
Abstract: We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making \emph{assumptions} about the behaviors of neighboring components, and providing \emph{guarantees} about their own behavior. We express these \emph{assume-guarantee contracts} as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.
- Rational verification: game-theoretic verification of multi-agent systems. Applied Intelligence, 51(9): 6569–6584.
- Scalable planning and learning for multiagent POMDPs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29-1.
- Bakule, L. 2008. Decentralized control: An overview. Annual reviews in control, 32(1): 87–98.
- Contracts for system design. Now Publishers.
- Automated assume-guarantee reasoning by abstraction refinement. In International Conference on Computer Aided Verification, 135–148. Springer.
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374): 418–424.
- Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1, 183–221.
- Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 35(2): 1–53.
- What is decidable about partially observable Markov decision processes with omega-regular objectives. Journal of Computer and System Sciences, 82(5): 878–911.
- Compositional set invariance in network systems with assume-guarantee contracts. In 2019 American Control Conference (ACC), 1027–1034. IEEE.
- Learning assumptions for compositional verification. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 331–346. Springer.
- Concurrent reachability games. Theoretical computer science, 386(3): 188–217.
- A quantitative approach on assume-guarantee contracts for safety of interconnected systems. In 2019 18th European Control Conference (ECC), 536–541. IEEE.
- Fisman, D. 2018. Inferring regular languages and ω𝜔\omegaitalic_ω-languages. Journal of Logical and Algebraic Methods in Programming, 98: 27–49.
- Automated assume-guarantee reasoning by abstraction refinement. In Computer Aided Verification: 20th International Conference, CAV 2008 Princeton, NJ, USA, July 7-14, 2008 Proceedings 20, 135–148. Springer.
- Multiagent Planning with Factored MDPs. In NIPS, volume 1, 1523–1530.
- Multi-Agent Reinforcement Learning with Temporal Logic Specifications. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 583–592.
- Dynamic programming for partially observable stochastic games. In AAAI, volume 4, 709–715.
- You assume, we guarantee: Methodology and case studies. In Computer Aided Verification: 10th International Conference, CAV’98 Vancouver, BC, Canada, June 28–July 2, 1998 Proceedings 10, 440–451. Springer.
- Compositional reinforcement learning from logical specifications. Advances in Neural Information Processing Systems, 34.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134.
- Assume-guarantee verification for probabilistic systems. In Tools and Algorithms for the Construction and Analysis of Systems: 16th International Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings 16, 23–37. Springer.
- Formal controller synthesis for continuous-space MDPs via model-free reinforcement learning. In Proc. 11th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 98–107.
- Compositional construction of infinite abstractions for networks of stochastic control systems. Automatica, 107: 125–137.
- Littman, M. L. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In International Conference on Machine Learning, 157–163.
- Efficient dynamic-programming updates in partially observable Markov decision processes.
- A generalized reinforcement-learning model: Convergence and applications. In International Conference on Machine Learning, 310–318.
- A decentralized energy-optimal control framework for connected automated vehicles at signal-free intersections. Automatica, 93: 244–256.
- Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1): 1–31.
- McMillan, K. L. 1998. Verification of an Implementation of Tomasulo’s Algorithm by Compositional Model Checking. In Computer Aided Verification (CAV’98), 110–121. LNCS 1427.
- Compositional abstraction and safety synthesis using overlapping symbolic models. IEEE Transactions on Automatic Control, 63(6): 1835–1841.
- Complexity of finite-horizon Markov decision process problems. Journal of the ACM (JACM), 47(4): 681–720.
- Pnueli, A. 1985. In transition from global to modular temporal reasoning about programs. In Logics and models of concurrent systems, 123–144. Springer.
- SAT-MARL: Specification Aware Training in Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2012.07949.
- Siljak, D. D. 2011. Decentralized control of complex systems. Courier Corporation.
- Adaptive and Sequential Gridding Procedures for the Abstraction and Verification of Stochastic Processes. SIAM J. Applied Dynamical Systems, 12(2): 921–956.
- Reinforcement Learning: An Introduction. MIT Press, second edition.
- Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, 330–337.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350–354.
- Watkins, C. J. C. H. 1989. Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge.
- A Survey on Multiagent Reinforcement Learning Towards Multi-Robot Systems. In CIG. Citeseer.
- Assume-guarantee reasoning framework for MDP-POMDP. In 2016 IEEE 55th Conference on Decision and Control (CDC), 795–800. IEEE.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.