Actor-Critic or Critic-Actor? A Tale of Two Time Scales (2210.04470v6)

Published 10 Oct 2022 in cs.LG

Abstract: We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

References (17)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Actor-Critic or Critic-Actor? A Tale of Two Time Scales (2210.04470v6)

Summary

Related Papers