Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 201 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning (2307.01708v2)

Published 4 Jul 2023 in cs.LG and cs.AI

Abstract: We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Policy-aware model learning for policy gradient methods. arXiv preprint arXiv:2003.00030, 2020.
  2. Acerbi, C. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance, 2002.
  3. Coherent measures of risk. Mathematical Finance, 1999.
  4. Deciding what to model: Value-equivalent sampling for reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
  5. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 2011.
  6. A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 2017.
  7. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org.
  8. Billingsley, P. Probability and Measure. John Wiley and Sons, second edition, 1986.
  9. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  10. Algorithms for CVaR optimization in MDPs. Advances in neural information processing systems, 2014.
  11. Risk-sensitive and robust decision-making: a CVaR optimization approach. Advances in neural information processing systems, 2015.
  12. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  13. Gradient-aware model-based policy search. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  14. Modern portfolio theory, 1950 to date. Journal of banking & finance, 1997.
  15. Farahmand, A.-m. Iterative value-aware model learning. Advances in Neural Information Processing Systems, 2018.
  16. Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics. PMLR, 2017.
  17. Efficient risk-averse reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
  18. The value equivalence principle for model-based reinforcement learning. Advances in Neural Information Processing Systems, 2020.
  19. Proper value equivalence. Advances in Neural Information Processing Systems, 2021.
  20. Approximate value equivalence. In Advances in Neural Information Processing Systems, 2022.
  21. Heger, M. Consideration of risk in reinforcement learning. In International Conference on Machine Learning, 1994.
  22. Risk-sensitive Markov decision processes. Management Science, 1972.
  23. When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 2019.
  24. Kusuoka, S. On law invariant coherent risk measures. Advances in Mathematical Economics, 2001.
  25. Learning exercise policies for american options. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, 2009.
  26. Distributional reinforcement learning for risk-sensitive policies. In Advances in Neural Information Processing Systems, 2022.
  27. Markowitz, H. Portfolio selection. The Journal of Finance, 1952.
  28. Parametric return density estimation for reinforcement learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010.
  29. Munkres, J. Topology. Featured Titles for Topology. Prentice Hall, Incorporated, 2000. ISBN 9780131816299.
  30. Action-conditional video prediction using deep networks in atari games. Advances in neural information processing systems, 2015.
  31. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In International Conference on Machine Learning, 2008.
  32. Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  33. Resnick, S. I. A Probability Path. Springer, 1999.
  34. An analysis of categorical distributional reinforcement learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
  35. Statistics and samples in distributional reinforcement learning. In International Conference on Machine Learning. PMLR, 2019.
  36. Russell, S. J. Artificial intelligence: a modern approach. Pearson Education, Inc., 2010.
  37. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 2020.
  38. Sutton, R. Learning to predict by the method of temporal differences. Machine Learning, 1988.
  39. Sutton, R. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 1991.
  40. Reinforcement learning: An introduction. MIT Press, 2018.
  41. Policy gradients with variance related risk criteria. In Proceedings of the 29th International Conference on Machine Learning, 2012.
  42. Policy gradient for coherent risk measures. Advances in neural information processing systems, 2015.
  43. Learning the variance of the reward-to-go. Journal of Machine Learning Research, 2016.
  44. Sequential decision making with coherent risk. IEEE Transactions on Automatic Control, 2017.
  45. Value gradient weighted model-based reinforcement learning. In International Conference on Learning Representations, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube