Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines (2403.07005v1)
Abstract: In this paper, we study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) to specify the reward functions such that the prior knowledge of high-level events in a task can be leveraged to facilitate the learning efficiency. Unlike the existing work that RMs have been incorporated into MARL for task decomposition and policy learning in relatively simple domains or with an assumption of independencies among the agents, we present Multi-Agent Reinforcement Learning with a Hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios when the events among agents can occur concurrently and the agents are highly interdependent. MAHRM exploits the relationship of high-level events to decompose a task into a hierarchy of simpler subtasks that are assigned to a small group of agents, so as to reduce the overall computational complexity. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
- Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:1901.08492, 2019.
- Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pages 166–175. PMLR, 2017.
- The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Ltl and beyond: Formal languages for reward function specification in reinforcement learning. In IJCAI, volume 19, pages 6065–6073, 2019.
- Disentangled planning and control in vision based robotics via reward machines. arXiv preprint arXiv:2012.14464, 2020.
- Reward machines for vision-based robotic manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 14284–14290. IEEE, 2021.
- Option-critic in cooperative multi-agent systems. arXiv preprint arXiv:1911.12825, 2019.
- Reinforcement learning for ltlf/ldlf goals. arXiv preprint arXiv:1807.06333, 2018.
- Temporal logic monitoring rewards via transducers. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, volume 17, pages 860–870, 2020.
- Learning quadruped locomotion policies with reward machines. arXiv preprint arXiv:2107.10969, 2021.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Induction of subgoal automata for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3890–3897, 2020.
- Reinforcement learning with non-markovian rewards. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3980–3987, 2020.
- Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 13(2):197–229, 2006.
- Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 55(2):895–943, 2022.
- A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183, 2017.
- Decentralized graph-based multi-agent reinforcement learning using reward machines. arXiv preprint arXiv:2110.00096, 2021.
- Using reward machines for high-level task specification and decomposition in reinforcement learning. In International Conference on Machine Learning, pages 2107–2116. PMLR, 2018.
- A composable specification language for reinforcement learning tasks. Advances in Neural Information Processing Systems, 32, 2019.
- Learning to coordinate manipulation skills via skill behavior diversification. In International Conference on Learning Representations, 2019.
- Extended markov games to learn multiple tasks in multi-agent reinforcement learning. In ECAI 2020, pages 139–146. IOS Press, 2020.
- Systematic generalisation through task temporal logic and deep reinforcement learning. arXiv preprint arXiv:2006.08767, 2020.
- Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3834–3839. IEEE, 2017.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Enforcing signal temporal logic specifications in multi-agent adversarial environments: A deep q-learning approach. In 2018 IEEE Conference on Decision and Control (CDC), pages 4141–4146. IEEE, 2018.
- Reward machines for cooperative multi-agent reinforcement learning. In 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021, pages 934–942. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2021.
- Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737, 2019.
- Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
- Amir Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 46–57. IEEE, 1977.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018.
- Learning non-markovian reward models in mdps. arXiv preprint arXiv:2001.09293, 2020.
- Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
- Hierarchical deep multiagent reinforcement learning with temporal abstraction. arXiv preprint arXiv:1809.09332, 2018.
- Teaching multiple tasks to an rl agent using ltl. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 452–461, 2018.
- Learning reward machines for partially observable reinforcement learning. Advances in neural information processing systems, 32, 2019.
- Learning probabilistic reward machines from non-markovian stochastic reward processes. arXiv preprint arXiv:2107.04633, 2021.
- Influence-based multi-agent exploration. In International Conference on Learning Representations, 2019.
- Q-learning. Machine learning, 8(3-4):279–292, 1992.
- Active finite reward automaton inference and reinforcement learning using queries and counterexamples. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, pages 115–135. Springer, 2021.
- Hierarchical cooperative multi-agent reinforcement learning with skill discovery. arXiv preprint arXiv:1912.03558, 2019.
- Modular deep reinforcement learning with temporal logic specifications. arXiv preprint arXiv:1909.11591, 2019.
- Xuejing Zheng (3 papers)
- Chao Yu (116 papers)