Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping (2411.01184v1)
Abstract: Multi-agent hierarchical reinforcement learning (MAHRL) has been studied as an effective means to solve intelligent decision problems in complex and large-scale environments. However, most current MAHRL algorithms follow the traditional way of using reward functions in reinforcement learning, which limits their use to a single task. This study aims to design a multi-agent cooperative algorithm with logic reward shaping (LRS), which uses a more flexible way of setting the rewards, allowing for the effective completion of multi-tasks. LRS uses Linear Temporal Logic (LTL) to express the internal logic relation of subtasks within a complex task. Then, it evaluates whether the subformulae of the LTL expressions are satisfied based on a designed reward structure. This helps agents to learn to effectively complete tasks by adhering to the LTL expressions, thus enhancing the interpretability and credibility of their decisions. To enhance coordination and cooperation among multiple agents, a value iteration technique is designed to evaluate the actions taken by each agent. Based on this evaluation, a reward function is shaped for coordination, which enables each agent to evaluate its status and complete the remaining subtasks through experiential learning. Experiments have been conducted on various types of tasks in the Minecraft-like environment. The results demonstrate that the proposed algorithm can improve the performance of multi-agents when learning to complete multi-tasks.
- Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pages 166–175, 2017.
- Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1, pages 183–221, 2010.
- Learning-based probabilistic LTL motion planning with environment and motion uncertainties. IEEE Transactions on Automatic Control, 66(5):2386–2392, 2021.
- Reinforcement learning based temporal logic control with maximum probabilistic satisfaction. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 806–812, 2021.
- Linear temporal logics for structured context-free languages. In 21st Italian Conference on Theoretical Computer Science, ICTCS 2020, volume 2756, pages 115–121, 2020.
- Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, ICAPS 2019, Berkeley, CA, USA, July 11-15, 2019, pages 128–136. AAAI Press, 2019.
- A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artificial Intelligence Review, 54(5):3215–3238, 2021.
- Concentration network for reinforcement learning of large-scale multi-agent systems. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, pages 9341–9349, 2022.
- Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 55(2):895–943, 2022.
- Teaching temporal logics to neural networks. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
- Multi-agent hierarchical reinforcement learning with dynamic termination. In Pacific Rim International Conference on Artificial Intelligence, pages 80–92, 2019.
- Openai baselines, 2017.
- Teaching multiple tasks to an RL agent using LTL. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pages 452–461, 2018.
- Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
- Block-level knowledge transfer for evolutionary multitask optimization. IEEE Transactions on Cybernetics, 54(1):558–571, 2024.
- Artificial intelligence (AI) prediction of atari game strategy by using reinforcement learning algorithms. In 2021 International Conference on Computational Performance Evaluation (ComPE), pages 536–539, 2021.
- Federated control with hierarchical multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.08266, 2017.
- Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5604–5610, 2020.
- Optimal policy generation for partially satisfiable co-safe LTL specifications. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 1587–1593, 2015.
- Extended markov games to learn multiple tasks in multi-agent reinforcement learning. In ECAI 2020, pages 139–146. 2020.
- Influence maximization in multiagent systems by a graph embedding method: Dealing with probabilistically unstable links. IEEE Transactions on Cybernetics, 53(9):6004–6016, 2023.
- A logical characterization of extensive games with short sight. Theor. Comput. Sci., 612(C):63–82, 2016.
- Modeling of agent cognition in extensive games via artificial neural networks. IEEE Transactions on Neural Networks and Learning Systems, 29(10):4857–4868, 2018.
- Characterization, verification and generation of strategies in games with resource constraints. Automatica, 140:110254, 2022.
- Gonçalo Neto. From single-agent to multi-agent reinforcement learning: Foundational concepts and methods. Learning theory course, 2, 2005.
- Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 50(9):3826–3839, 2020.
- Finite automata and their decision problems. IBM journal of research and development, 3(2):114–125, 1959.
- Robust temporal logic model predictive control. In 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015, Allerton Park & Retreat Center, Monticello, IL, USA, September 29 - October 2, 2015, pages 772–779, 2015.
- Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
- Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative learning. Readings in Agents, pages 487–494, 1997.
- Ltl2action: Generalizing LTL instructions for multi-task RL. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139, pages 10497–10508, 2021.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Task-driven reinforcement learning with action primitives for long-horizon manipulation skills. IEEE Transactions on Cybernetics, pages 1–14, 2023.
- Probably approximately correct learning in adversarial environments with temporal logic specifications. IEEE Transactions on Automatic Control, 2021.
- Bin Wu. Hierarchical macro strategy model for MOBA game AI. In Proceedings of the 33th AAAI Conference on Artificial Intelligence, volume 33, pages 1206–1213, 2019.
- High-dimensional fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 54(1):507–519, 2024.
- Reinforcement learning for general LTL objectives is intractable. arXiv preprint arXiv:2111.12679, 2021.
- Multi-agent incentive communication via decentralized teammate modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9466–9474, 2022.
- Hierarchical control of multi-agent reinforcement learning team in real-time strategy (RTS) games. Expert Systems with Applications, 186:115707, 2021.