Papers
Topics
Authors
Recent
Search
2000 character limit reached

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Published 10 Oct 2022 in cs.LG | (2210.04470v6)

Abstract: We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Barto, A. G., Sutton, R. S. and Anderson, C. W., 1983. “Neuronlike adaptive elements that can solve difficult learning control problems”. IEEE transactions on Systems, Man and Cybernetics, (5), pp. 834-846.
  2. Benaim, M., 1996. “A dynamical systems approach to stochastic approximations”, SIAM J.Control and Optimization, 34(2), pp.437-472.
  3. Bhatnagar, S., 2010. “An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes”, Systems and Control Letters, 59(12), pp.  760-766.
  4. Bhatnagar, S. and Babu, K. M., 2008. “New algorithms of the Q-learning type”. Automatica, 44(4), pp. 1111-1119.
  5. Bhatnagar, S. and Lakshmanan, K., 2012. “An online actor–critic algorithm with function approximation for constrained Markov decision processes”, Journal of Optimization Theory and Applications, 153, pp.  688-708.
  6. Bhatnagar, S. and Lakshmanan, K., 2016. “Multiscale Q-learning with linear function approximation”. Discrete Event Dynamic Systems, 26(3), pp. 477-509.
  7. Bhatnagar, S., Sutton, R., Ghavamzadeh, M., and Lee, M., 2009. “Natural actor-critic algorithms”. Automatica, 45(11), pp. 2471-2482.
  8. Borkar, V. S., 1998. “Asynchronous stochastic approximation”. SIAM Journal on Control and Optimization, 36(3), pp. 840-851. (Erratum in SIAM Journal on Control and Optimization, 38(2), 2000, pp. 662-663).
  9. Borkar, V. S., 2005. “An actor-critic algorithm for constrained Markov decision processes”, Systems and control letters, 54(3), pp. 207-213.
  10. Konda, V.R. and Borkar, V.S., 1999. “Actor-critic–type learning algorithms for Markov decision processes”. SIAM Journal on control and Optimization, 38(1), pp. 94-123.
  11. Konda, V. R. and Tsitsiklis, J. N., 2003. “On actor-critic algorithms”. SIAM journal on Control and Optimization, 42(4), pp. 1143-1166.
  12. L.Ljung, L., 1977. “Analysis of recursive stochastic algorithms”, IEEE Transactions on Automatic Control, AC-22(4), pp. 551-575.
  13. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S., 2015. “Human-level control through deep reinforcement learning”, Nature, 518(7540), pp. 529-33.
  14. Sutton, R., 1988. “Learning to predict by the method of temporal differences”. Machine Learning, 3, pp. 9-44.
  15. Sutton, R., McAllester, D., Singh, S., and Mansour, Y., 1999. “Policy gradient methods for reinforcement learning with function approximation”. Proceedings of NeurIPS, pp. 1057-1163.
  16. Tsitsiklis, J. N. and Van Roy, B., 1997. “An analysis of temporal difference learning with function approximation”, IEEE Transactions on Automatic Control, 42(5), pp. 674-690.
  17. Watkins, C. J. C. H. and Dayan, P., 1992. “Q-learning”. Machine Learning, 8, pp. 279-292.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.