Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation (2305.06446v3)
Abstract: We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server. We propose a provably efficient algorithm based on value iteration that enable asynchronous communication while ensuring the advantage of cooperation with low communication overhead. With linear function approximation, we prove that our algorithm enjoys an $\tilde{\mathcal{O}}(d{3/2}H2\sqrt{K})$ regret with $\tilde{\mathcal{O}}(dHM2)$ communication complexity, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the total number of agents, and $K$ is the total number of episodes. We also provide a lower bound showing that a minimal $\Omega(dM)$ communication complexity is required to improve the performance through collaboration.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24:2312–2320, 2011.
- Vo q𝑞qitalic_q l: Towards optimal regret in model-free rl with nonlinear function approximation. arXiv preprint arXiv:2212.06069, 2022.
- Distributed contextual linear bandits with minimax optimal communication cost. arXiv preprint arXiv:2205.13170, 2022.
- Model-based reinforcement learning with value-targeted regression. In International Conference on Machine Learning, pp. 463–474. PMLR, 2020.
- Bazzan, A. L. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multi-Agent Systems, 18(3):342–375, 2009.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers, pp. 177–186. Springer, 2010.
- Linear least-squares algorithms for temporal difference learning. Machine learning, 22(1):33–57, 1996.
- Provably efficient exploration in policy optimization. In International Conference on Machine Learning, pp. 1283–1294. PMLR, 2020.
- Efficient parallel methods for deep reinforcement learning. arXiv preprint arXiv:1705.04862, 2017.
- Large scale distributed deep networks. Advances in neural information processing systems, 25, 2012.
- Distributed reinforcement learning for cooperative multi-robot object manipulation. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1831–1833, 2020.
- Towards an efficient client selection system for federated learning. In 15th International Conference on Cloud Computing, CLOUD 2022, pp. 13–21. Springer, 2022.
- Differentially-private federated linear bandits. Advances in Neural Information Processing Systems, 33:6003–6014, 2020.
- Provably efficient cooperative multi-agent reinforcement learning with function approximation. arXiv preprint arXiv:2103.04972, 2021.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning, pp. 1407–1416. PMLR, 2018.
- Fault-tolerant federated reinforcement learning with theoretical guarantee. Advances in Neural Information Processing Systems, 34:1007–1021, 2021.
- Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning. In International Conference on Machine Learning, pp. 6392–6417. PMLR, 2022.
- Parallel reinforcement learning with linear function approximation. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, pp. 1–3, 2007.
- Logarithmic regret for reinforcement learning with linear function approximation. In International Conference on Machine Learning. PMLR, 2021.
- A simple and provably efficient algorithm for asynchronous federated contextual linear bandits. In Advances in Neural Information Processing Systems, 2022a.
- Nearly minimax optimal reinforcement learning for linear markov decision processes. arXiv preprint arXiv:2212.06132, 2022b.
- Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020.
- Distributed prioritized experience replay. In International Conference on Learning Representations, 2018.
- Nearly minimax optimal reinforcement learning with linear function approximation. In International Conference on Machine Learning, pp. 8971–9019. PMLR, 2022.
- Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
- Model-based reinforcement learning with value-targeted regression. In Learning for Dynamics and Control, pp. 666–686. PMLR, 2020.
- Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pp. 2137–2143. PMLR, 2020.
- Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp. 18–37. PMLR, 2022.
- Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Federated reinforcement learning: Linear speedup under markovian sampling. In International Conference on Machine Learning, pp. 10997–11057. PMLR, 2022.
- Improved regret analysis for variance-adaptive linear bandits and horizon-free linear mixture mdps. arXiv preprint arXiv:2111.03289, 2021.
- Kretchmar, R. M. Parallel reinforcement learning. In The 6th World Conference on Systemics, Cybernetics, and Informatics. Citeseer, 2002.
- Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations, 2022.
- Bandit algorithms. Cambridge University Press, 2020.
- Asynchronous upper confidence bound algorithms for federated linear bandits. In International Conference on Artificial Intelligence and Statistics, pp. 6529–6553. PMLR, 2022.
- Communication efficient distributed machine learning with the parameter server. Advances in Neural Information Processing Systems, 27, 2014.
- Rllib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning, pp. 3053–3062. PMLR, 2018.
- A distributed reinforcement learning scheme for network routing. In Proceedings of the international workshop on applications of neural networks to telecommunications, pp. 55–61. Psychology Press, 2013.
- Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. IEEE Robotics and Automation Letters, 4(4):4555–4562, 2019.
- Indoor navigation for mobile agents: A multimodal vision fusion model. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, 2020.
- Stateful active facilitator: Coordination and environmental heterogeneity in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2210.03022, 2022.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes. In International Conference on Learning Representations, 2023.
- Q-learning with linear function approximation. In International Conference on Computational Learning Theory, pp. 308–322. Springer, 2007.
- Variance-aware off-policy evaluation with linear function approximation. Advances in neural information processing systems, 34:7598–7610, 2021.
- Learning stochastic shortest path with linear function approximation. In International Conference on Machine Learning, pp. 15584–15629. PMLR, 2022a.
- Learn to match with no regret: Reinforcement learning in markov matching markets. In Advances in Neural Information Processing Systems, 2022b.
- Sample complexity of reinforcement learning using linearly combined model ensembles. In International Conference on Artificial Intelligence and Statistics, pp. 2010–2020. PMLR, 2020.
- Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015.
- A unifying view of optimism in episodic reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020.
- Federated reinforcement learning: techniques, applications, and open challenges. arXiv preprint arXiv:2108.11887, 2021.
- Linear bandits with limited adaptivity and learning distributional optimal design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 74–87, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- A survey on distributed machine learning. Acm computing surveys (csur), 53(2):1–33, 2020.
- Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
- Multi-agent reinforcement learning via double averaging primal-dual optimization. Advances in Neural Information Processing Systems, 31, 2018.
- Provably efficient reinforcement learning with linear function approximation under adaptivity constraints. In Advances in Neural Information Processing Systems, 2021.
- Distributed bandit learning: Near-optimal regret with efficient communication. In International Conference on Learning Representations, 2020.
- Aggressive driving with model predictive path integral control. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1433–1440. IEEE, 2016.
- Achieving online and scalable information integrity by harnessing social spam correlations. IEEE Access, 2023a.
- Finding regularized competitive equilibria of heterogeneous agent macroeconomic models via reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp. 375–407. PMLR, 2023b.
- Sample-optimal parametric q-learning using linearly additive features. In International Conference on Machine Learning, pp. 6995–7004, 2019.
- Towards playing full moba games with deep reinforcement learning. Advances in Neural Information Processing Systems, 33:621–632, 2020.
- Near-optimal offline reinforcement learning via double variance reduction. In Advances in Neural Information Processing Systems, 2021.
- The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.
- When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network. IEEE Internet of Things Journal, 8(4):2238–2251, 2020.
- Multi-agent correlated equilibrium q (λ𝜆\lambdaitalic_λ) learning for coordinated smart generation control of interconnected power grids. IEEE transactions on power systems, 30(4):1669–1679, 2014.
- Learning near optimal policies with low inherent bellman error. In International Conference on Machine Learning, pp. 10978–10989. PMLR, 2020.
- Deepmtl: Deep learning based multiple transmitter localization. In IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2021. doi: 10.1109/WoWMoM51794.2021.00017.
- Deepmtl pro: Deep learning based multiple transmitter localization and power estimation. Pervasive and Mobile Computing, 2022. doi: 10.1016/j.pmcj.2022.101582.
- Networked multi-agent reinforcement learning in continuous spaces. In 2018 IEEE conference on decision and control (CDC), pp. 2771–2776. IEEE, 2018a.
- Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, pp. 5872–5881. PMLR, 2018b.
- Variance-aware confidence set: Variance-dependent bound for linear bandits and horizon-free bound for linear mixture mdp. arXiv preprint arXiv:2101.12745, 2021.
- Horizon-free reinforcement learning in polynomial time: the power of stationary policies. In Conference on Learning Theory, pp. 3858–3904. PMLR, 2022.
- Computationally efficient horizon-free reinforcement learning for linear mixture mdps. arXiv preprint arXiv:2205.11507, 2022.
- Nearly minimax optimal reinforcement learning for linear mixture markov decision processes. In Conference on Learning Theory. PMLR, 2021a.
- Provably efficient reinforcement learning for discounted mdps with feature mapping. In International Conference on Machine Learning. PMLR, 2021b.
- Provably efficient reinforcement learning for discounted mdps with feature mapping. In International Conference on Machine Learning, pp. 12793–12802. PMLR, 2021c.
- Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277, 2019.