Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning (2307.01708v2)
Abstract: We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.
- Policy-aware model learning for policy gradient methods. arXiv preprint arXiv:2003.00030, 2020.
- Acerbi, C. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance, 2002.
- Coherent measures of risk. Mathematical Finance, 1999.
- Deciding what to model: Value-equivalent sampling for reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
- Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 2011.
- A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 2017.
- Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org.
- Billingsley, P. Probability and Measure. John Wiley and Sons, second edition, 1986.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Algorithms for CVaR optimization in MDPs. Advances in neural information processing systems, 2014.
- Risk-sensitive and robust decision-making: a CVaR optimization approach. Advances in neural information processing systems, 2015.
- Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Gradient-aware model-based policy search. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
- Modern portfolio theory, 1950 to date. Journal of banking & finance, 1997.
- Farahmand, A.-m. Iterative value-aware model learning. Advances in Neural Information Processing Systems, 2018.
- Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics. PMLR, 2017.
- Efficient risk-averse reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
- The value equivalence principle for model-based reinforcement learning. Advances in Neural Information Processing Systems, 2020.
- Proper value equivalence. Advances in Neural Information Processing Systems, 2021.
- Approximate value equivalence. In Advances in Neural Information Processing Systems, 2022.
- Heger, M. Consideration of risk in reinforcement learning. In International Conference on Machine Learning, 1994.
- Risk-sensitive Markov decision processes. Management Science, 1972.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 2019.
- Kusuoka, S. On law invariant coherent risk measures. Advances in Mathematical Economics, 2001.
- Learning exercise policies for american options. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, 2009.
- Distributional reinforcement learning for risk-sensitive policies. In Advances in Neural Information Processing Systems, 2022.
- Markowitz, H. Portfolio selection. The Journal of Finance, 1952.
- Parametric return density estimation for reinforcement learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010.
- Munkres, J. Topology. Featured Titles for Topology. Prentice Hall, Incorporated, 2000. ISBN 9780131816299.
- Action-conditional video prediction using deep networks in atari games. Advances in neural information processing systems, 2015.
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In International Conference on Machine Learning, 2008.
- Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Resnick, S. I. A Probability Path. Springer, 1999.
- An analysis of categorical distributional reinforcement learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
- Statistics and samples in distributional reinforcement learning. In International Conference on Machine Learning. PMLR, 2019.
- Russell, S. J. Artificial intelligence: a modern approach. Pearson Education, Inc., 2010.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 2020.
- Sutton, R. Learning to predict by the method of temporal differences. Machine Learning, 1988.
- Sutton, R. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 1991.
- Reinforcement learning: An introduction. MIT Press, 2018.
- Policy gradients with variance related risk criteria. In Proceedings of the 29th International Conference on Machine Learning, 2012.
- Policy gradient for coherent risk measures. Advances in neural information processing systems, 2015.
- Learning the variance of the reward-to-go. Journal of Machine Learning Research, 2016.
- Sequential decision making with coherent risk. IEEE Transactions on Automatic Control, 2017.
- Value gradient weighted model-based reinforcement learning. In International Conference on Learning Representations, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.