Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When is Realizability Sufficient for Off-Policy Reinforcement Learning? (2211.05311v2)

Published 10 Nov 2022 in cs.LG

Abstract: Model-free algorithms for reinforcement learning typically require a condition called BeLLMan completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met. However, BeLLMan completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent BeLLMan error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of BeLLMan completeness, namely the mis-alignment between the chosen function class and its image through the BeLLMan operator. In essence, these error bounds establish that off-policy reinforcement learning remains statistically viable even in absence of BeLLMan completeness, and characterize the intermediate situation between the favorable BeLLMan complete setting and the worst-case scenario where exponential lower bounds are in force. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.

Citations (10)

Summary

We haven't generated a summary for this paper yet.