Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis (2403.08955v2)

Published 13 Mar 2024 in cs.LG, cs.AI, and math.OC

Abstract: Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration complexity and robustness. Risk-sensitive RL, which balances expected return and risk, has been explored for its potential to yield probabilistically robust policies, yet its iteration complexity analysis remains underexplored. In this study, we conduct a thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function. We obtain an iteration complexity of $\cO(\epsilon{-2})$ to reach an $\epsilon$-approximate first-order stationary point (FOSP). We investigate whether risk-sensitive algorithms can potentially achieve better iteration complexity compared to their risk-neutral counterparts. Our theoretical analysis demonstrates that risk-sensitive REINFORCE can potentially have a reduced number of iterations required for convergence. This leads to improved iteration complexity, as employing the exponential utility does not entail additional computation per iteration. We characterize the conditions under which risk-sensitive algorithms can potentially achieve better iteration complexity. Our simulation results also validate that risk-averse cases can converge and stabilize more quickly after $41\%$ of the episodes compared to their risk-neutral counterparts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Reinforcement learning algorithms: An overview and classification. In 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp.  1–7. IEEE, 2021.
  2. Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15:319–350, 2001.
  3. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
  4. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  5. Reinforcement learning in economics and finance. Computational Economics, pp.  1–38, 2021.
  6. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
  7. Sample complexity of episodic fixed-horizon reinforcement learning. Advances in Neural Information Processing Systems, 28, 2015.
  8. Epistemic risk-sensitive reinforcement learning. arXiv preprint arXiv:1906.06273, 2019.
  9. Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret. Advances in Neural Information Processing Systems, 33:22384–22395, 2020.
  10. Angelos Filos. Reinforcement learning for portfolio management. arXiv preprint arXiv:1909.09571, 2019.
  11. Entropic risk optimization in discounted mdps. In International Conference on Artificial Intelligence and Statistics, pp.  47–76. PMLR, 2023.
  12. Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3-4):142–363, 2017.
  13. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
  14. Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
  15. Sham Machandranath Kakade. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom), 2003.
  16. Better theory for sgd in the nonconvex world. arXiv preprint arXiv:2002.03329, 2020.
  17. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. The sample-complexity of general reinforcement learning. In International Conference on Machine Learning, pp.  28–36. PMLR, 2013.
  19. Learning bounds for risk-sensitive learning. Advances in Neural Information Processing Systems, 33:13867–13879, 2020.
  20. Data-driven distributionally robust optimal control with state-dependent noise. arXiv preprint arXiv:2303.02293, 2023.
  21. Risk-sensitive inverse reinforcement learning via coherent risk models. In Robotics: science and systems, volume 16, pp.  117, 2017.
  22. Risk-sensitive reinforcement learning. Machine learning, 49:267–290, 2002.
  23. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  24. Risk-sensitive reinforce: A monte carlo policy gradient algorithm for exponential performance criteria. In 2021 60th IEEE Conference on Decision and Control (CDC), pp.  1522–1527. IEEE, 2021.
  25. Risk-sensitive reinforcement learning with exponential criteria. arXiv preprint arXiv:2212.09010, 2022.
  26. Matteo Papini. Safe policy optimization. 2021.
  27. Stochastic variance-reduced policy gradient. In International conference on machine learning, pp.  4026–4035. PMLR, 2018.
  28. Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends® in Machine Learning, 15(5):537–693, 2022.
  29. Rmix: Learning risk-sensitive policies for cooperative reinforcement learning agents. Advances in Neural Information Processing Systems, 34:23049–23062, 2021.
  30. Risk-sensitive reinforcement learning. Neural computation, 26(7):1298–1328, 2014.
  31. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  32. Sebastian U Stich. Unified optimal analysis of the (stochastic) gradient method. arXiv preprint arXiv:1907.04232, 2019.
  33. Reinforcement learning: An introduction. MIT press, 2018.
  34. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  35. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  36. Sample efficient policy gradient methods with recursive variance reduction. arXiv preprint arXiv:1909.08610, 2019.
  37. An improved convergence analysis of stochastic variance-reduced policy gradient. In Uncertainty in Artificial Intelligence, pp.  541–551. PMLR, 2020.
  38. A general sample complexity analysis of vanilla policy gradient. In International Conference on Artificial Intelligence and Statistics, pp.  3332–3380. PMLR, 2022.
  39. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Transactions on Neural Networks and Learning Systems, 32(12):5435–5444, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets