Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Off-policy Learning with Eligibility Traces: A Survey (1304.3999v1)

Published 15 Apr 2013 in cs.AI and cs.RO

Abstract: In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. This leads to some known algorithms - off-policy LSTD(\lambda), LSPE(\lambda), TD(\lambda), TDC/GQ(\lambda) - and suggests new extensions - off-policy FPKF(\lambda), BRM(\lambda), gBRM(\lambda), GTD2(\lambda). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD(\lambda)/LSPE(\lambda) - and TD(\lambda) if the feature space dimension is too large for a least-squares approach - perform the best.

Citations (92)

Summary

We haven't generated a summary for this paper yet.