Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization (1908.02805v4)

Published 7 Aug 2019 in math.OC, cs.LG, and cs.MA

Abstract: We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. When the dimension of state-action space is large, the temporal-difference learning with linear function approximation is widely used. In this paper, we develop a new distributed temporal-difference learning algorithm and quantify its finite-time performance. Our algorithm combines a distributed stochastic primal-dual method with a homotopy-based approach to adaptively adjust the learning rate in order to minimize the mean-square projected BeLLMan error by taking fresh online samples from a causal on-policy trajectory. We explicitly take into account the Markovian nature of sampling and improve the best-known finite-time error bound from $O(1/\sqrt{T})$ to~$O(1/T)$, where $T$ is the total number of iterations.

Authors (5)

Dongsheng Ding (12 papers)
Xiaohan Wei (37 papers)
Zhuoran Yang (155 papers)
Zhaoran Wang (164 papers)
Mihailo R. Jovanović (50 papers)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization (1908.02805v4)

Summary

Related Papers