Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption (2310.05179v3)

Published 8 Oct 2023 in cs.LG

Abstract: One of the main challenges in reinforcement learning (RL) is that the agent has to make decisions that would influence the future performance without having complete knowledge of the environment. Dynamically adjusting the level of epistemic risk during the learning process can help to achieve reliable policies in safety-critical settings with better efficiency. In this work, we propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA). This framework quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online. The selection of risk levels is performed efficiently via a grid search using a Follow-The-Leader-type algorithm, where the offline oracle corresponds to a "satisficing measure" under a specially modified loss function. We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. On the coherence of expected shortfall. Journal of Banking & Finance, 26(7):1487–1503, 2002.
  2. Learning in non-convex games with an optimization oracle. In Conference on Learning Theory, pp.  18–29. PMLR, 2019.
  3. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pp.  104–114. PMLR, 2020.
  4. Orl: Reinforcement learning benchmarks for online stochastic optimization problems. arXiv preprint arXiv:1911.10641, 2019.
  5. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617, 2018.
  6. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, pp.  449–458. PMLR, 2017.
  7. Addressing inherent uncertainty: Risk-sensitive behavior generation for automated driving using distributional reinforcement learning. In 2019 IEEE Intelligent Vehicles Symposium (IV), pp.  2148–2155. IEEE, 2019.
  8. Satisficing measures for analysis of risky positions. Management Science, 55(1):71–84, 2009.
  9. Prediction, learning, and games. Cambridge university press, 2006.
  10. Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20:89–97, 2004.
  11. Disentangling epistemic and aleatoric uncertainty in reinforcement learning. arXiv preprint arXiv:2206.01558, 2022.
  12. Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  8337–8344. IEEE, 2021.
  13. Estimating risk and uncertainty in deep reinforcement learning. arXiv preprint arXiv:1905.09638, 2019.
  14. Implicit quantile networks for distributional reinforcement learning. In International Conference on Machine Learning, pp.  1096–1105. PMLR, 2018a.
  15. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018b.
  16. Remarks on quantiles and distortion risk measures. European Actuarial Journal, 2:319–328, 2012.
  17. Christos Dimitrakakis. Nearly optimal exploration-exploitation decision thresholds. In Artificial Neural Networks–ICANN 2006: 16th International Conference, Athens, Greece, September 10-14, 2006. Proceedings, Part I 16, pp.  850–859. Springer, 2006.
  18. Epistemic risk-sensitive reinforcement learning. arXiv preprint arXiv:1906.06273, 2019.
  19. Sentinel: taming uncertainty with ensemble based distributional reinforcement learning. In Uncertainty in Artificial Intelligence, pp.  631–640. PMLR, 2022.
  20. Rasr: Risk-averse soft-robust mdps with evar and entropic risk. arXiv preprint arXiv:2209.04067, 2022.
  21. Wenjie Huang. Data-driven satisficing measure and ranking. Journal of the Operational Research Society, 71(3):456–474, 2020.
  22. A fast total variation minimization method for image restoration. Multiscale Modeling & Simulation, 7(2):774–795, 2008.
  23. Or-gym: A reinforcement learning library for operations research problems. arXiv preprint arXiv:2008.06319, 2020.
  24. Peter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics, pp.  492–518. Springer, 1992.
  25. Efficient algorithms for online decision problems. In Learning Theory and Kernel Machines, pp.  26–40. Springer, 2003.
  26. Being optimistic to be conservative: Quickly learning a cvar policy. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  4436–4443, 2020.
  27. Regression quantiles. Econometrica: Journal of the Econometric Society, pp.  33–50, 1978.
  28. An efficient augmented lagrangian method with applications to total variation minimization. Computational Optimization and Applications, 56:507–530, 2013.
  29. Distributional reinforcement learning for risk-sensitive policies. In Advances in Neural Information Processing Systems, 2022.
  30. Adaptive risk-tendency: Nano drone navigation in cluttered environments with distributional reinforcement learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  7198–7204. IEEE, 2023.
  31. Information theory, inference and learning algorithms. Cambridge university press, 2003.
  32. Benchmarking reinforcement learning algorithms on real-world robots. In Conference on Robot Learning, pp.  561–591. PMLR, 2018.
  33. Distributional reinforcement learning for efficient exploration. In International Conference on Machine Learning, pp.  4424–4434. PMLR, 2019.
  34. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  35. Parametric return density estimation for reinforcement learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pp.  368–375, 2010.
  36. Tactical optimism and pessimism for deep reinforcement learning. Advances in Neural Information Processing Systems, 34:12849–12863, 2021.
  37. Risk perspective exploration in distributional reinforcement learning. In ICML 2022 Workshop AI for Agent-Based Modelling.
  38. Deep exploration via bootstrapped dqn. Advances in Neural Information Processing Systems, 29, 2016.
  39. Cumulative prospect theory meets reinforcement learning: Prediction and control. In International Conference on Machine Learning, pp.  1406–1415. PMLR, 2016.
  40. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  41. Statistics and samples in distributional reinforcement learning. In International Conference on Machine Learning, pp.  5528–5536. PMLR, 2019.
  42. Automatic risk adaptation in distributional reinforcement learning. arXiv preprint arXiv:2106.06317, 2021.
  43. Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
  44. Improving robustness via risk averse distributional reinforcement learning. In Learning for Dynamics and Control, pp.  958–968. PMLR, 2020.
  45. Exploration by distributional reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp.  2710–2716, 2018.
  46. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4):297–323, 1992.
  47. Shaun Wang. Premium calculation by transforming the layer premium density. ASTIN Bulletin: The Journal of the IAA, 26(1):71–92, 1996.
  48. Q-learning. Machine Learning, 8(3):279–292, 1992.
  49. Fully parameterized quantile function for distributional reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  50. Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning, pp.  1–29, 2022.
  51. Cautious adaptation for reinforcement learning in safety-critical settings. In International Conference on Machine Learning, pp.  11055–11065. PMLR, 2020.

Summary

We haven't generated a summary for this paper yet.