Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributed Policy Evaluation Under Multiple Behavior Strategies (1312.7606v2)

Published 30 Dec 2013 in cs.MA, cs.AI, cs.DC, and cs.LG

Abstract: We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sergio Valcarcel Macua (13 papers)
  2. Jianshu Chen (66 papers)
  3. Santiago Zazo (17 papers)
  4. Ali H. Sayed (151 papers)
Citations (101)

Summary

We haven't generated a summary for this paper yet.