Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization (1908.02805v4)

Published 7 Aug 2019 in math.OC, cs.LG, and cs.MA

Abstract: We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. When the dimension of state-action space is large, the temporal-difference learning with linear function approximation is widely used. In this paper, we develop a new distributed temporal-difference learning algorithm and quantify its finite-time performance. Our algorithm combines a distributed stochastic primal-dual method with a homotopy-based approach to adaptively adjust the learning rate in order to minimize the mean-square projected BeLLMan error by taking fresh online samples from a causal on-policy trajectory. We explicitly take into account the Markovian nature of sampling and improve the best-known finite-time error bound from $O(1/\sqrt{T})$ to~$O(1/T)$, where $T$ is the total number of iterations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dongsheng Ding (12 papers)
  2. Xiaohan Wei (37 papers)
  3. Zhuoran Yang (155 papers)
  4. Zhaoran Wang (164 papers)
  5. Mihailo R. Jovanović (50 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.