A finite time analysis of distributed Q-learning

Published 23 May 2024 in cs.AI, cs.LG, and cs.MA | (2405.14078v1)

Abstract: Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left{\frac{1}{\epsilon^{2}\frac{t_{\text{mix}}}{(1-\gamma)⁶} d_{\min}⁴ } ,\frac{1}{\epsilon}\frac{\sqrt{|\gS||\gA|}}{(1-\sigma_2(\boldsymbol{W}))(1-\gamma)⁴ d_{\min}^3} \right}\right)$ under tabular lookup