Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Foundation of Distributionally Robust Reinforcement Learning (2311.09018v3)

Published 15 Nov 2023 in cs.LG, cs.SY, eess.SY, math.OC, and stat.ML

Abstract: Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embraces various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
  2. Richard Bellman. The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6):503 – 515, 1954.
  3. Dynamic pricing: A learning approach. Mathematical and computational models for congestion charging, pages 45–79, 2006.
  4. Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage, 2023.
  5. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the tenth ACM international conference on web search and data mining, pages 661–670, 2017.
  6. Dynamic treatment regimes. Annual review of statistics and its application, 1:447–464, 2014.
  7. Seeing is not believing: Robust reinforcement learning against spurious correlation. arXiv preprint arXiv:2307.07907, 2023.
  8. Arlington M Fink. Equilibrium in a stochastic n𝑛nitalic_n-person game. Journal of science of the hiroshima university, series ai (mathematics), 28(1):89–93, 1964.
  9. Bandits atop reinforcement learning: Tackling online inventory models with cyclic demands. Management Science, 2023.
  10. Minimax control of discrete-time stochastic systems. SIAM Journal on Control and Optimization, 41(5):1626–1659, 2002.
  11. Robust markov decision processes: Beyond rectangularity. Mathematics of Operations Research, 48(1):203–226, 2023. doi: 10.1287/moor.2022.1259. URL https://doi.org/10.1287/moor.2022.1259.
  12. Beyond discounted returns: Robust markov decision processes with average and blackwell optimality. arXiv preprint arXiv:2312.03618, 2023.
  13. Recurrent neural networks for stochastic control in real-time bidding. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2801–2809, 2019.
  14. Learning-based robust optimization: Procedures and statistical guarantees, 2020.
  15. Prediction-driven surge planning with application in the emergency department. Submitted to Management Science, 2021.
  16. A study of distributionally robust multistage stochastic optimization. arXiv preprint arXiv:1708.07930, 2017.
  17. Garud N Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.
  18. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  19. Reinforcement learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013.
  20. A reinforcement learning approach to view planning for automated inspection tasks. Sensors, 21(6), 2021. ISSN 1424-8220. doi: 10.3390/s21062030. URL https://www.mdpi.com/1424-8220/21/6/2030.
  21. Bandit algorithms. Cambridge University Press, 2020.
  22. Yann Le Tallec. Robust, risk-sensitive, and data-driven control of Markov decision processes. PhD thesis, Massachusetts Institute of Technology, 2007.
  23. Yehuda Levy. Discounted stochastic games with no stationary nash equilibrium: two examples. Econometrica, 81(5):1973–2007, 2013.
  24. Rectangularity and duality of distributionally robust markov decision processes, 2023.
  25. Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
  26. Distributionally robust Q-learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 13623–13643. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/liu22a.html.
  27. Wasserstein distributionally robust linear-quadratic estimation under martingale constraints. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 8629–8644. PMLR, 25–27 Apr 2023. URL https://proceedings.mlr.press/v206/lotidis23a.html.
  28. Model-free non-stationary rl: Near-optimal regret and applications in multi-agent rl and inventory control. arXiv preprint arXiv:2010.03161, 2020.
  29. Markov perfect equilibrium: I. observable actions. Journal of Economic Theory, 100(2):191–219, 2001.
  30. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  31. Information structures and values in zero-sum stochastic games. In 2017 American Control Conference (ACC), pages 3658–3663. IEEE, 2017.
  32. Common information based markov perfect equilibria for stochastic games with asymmetric information: Finite games. IEEE Transactions on Automatic Control, 59(3):555–570, 2013.
  33. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.
  34. Andrzej S Nowak. On a new class of nonzero-sum discounted stochastic games having stationary nash equilibrium points. International Journal of Game Theory, 32(1):121, 2003.
  35. Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952, 2017.
  36. Sample complexity of robust reinforcement learning with a generative model, 2021. URL https://arxiv.org/abs/2112.01506.
  37. T Parthasarathy and S Sinha. Existence of stationary equilibrium strategies in non-zero sum discounted stochastic games with uncountable state space and state-independent transitions. International Journal of Game Theory, 18:189–194, 1989.
  38. Mathematical foundations of distributionally robust multistage optimization. SIAM Journal on Optimization, 31(4):3044–3067, 2021.
  39. The simplex method is strongly polynomial for deterministic markov decision processes, 2013.
  40. Soroush Saghafian. Ambiguous dynamic treatment regimes: A reinforcement learning approach. Management Science, 2023.
  41. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  42. Alexander Shapiro. Distributionally robust optimal control and MDP modeling. Operations Research Letters, 49(5):809–814, 2021.
  43. Alexander Shapiro. Distributionally robust modeling of optimal control. Operations Research Letters, 50(5):561–567, 2022.
  44. Technical note—time inconsistency of optimal policies of distributionally robust inventory models. Operations Research, 68(5):1576–1584, 2020. doi: 10.1287/opre.2019.1932. URL https://doi.org/10.1287/opre.2019.1932.
  45. Lectures on stochastic programming: modeling and theory. SIAM, 2021.
  46. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  47. Distributionally robust model-based offline reinforcement learning with near-optimal sample complexity, 2022. URL https://arxiv.org/abs/2208.05767.
  48. The curious price of distributional robustness in reinforcement learning with a generative model, 2023.
  49. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  50. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  51. Robert Samuel Simon. Games of incomplete information, ergodic theory, and the measurability of equilibria. Israel Journal of Mathematics, 138(1):73–92, 2003.
  52. Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171 – 176, 1958.
  53. Stochastic games. Proceedings of the National Academy of Sciences, 112(45):13743–13746, 2015.
  54. Masayuki Takahashi. Equilibrium points of stochastic non-cooperative n𝑛nitalic_n-person games. Journal of Science of the Hiroshima University, Series AI (Mathematics), 28(1):95–99, 1964.
  55. Distributionally robust linear quadratic control, 2023.
  56. A finite sample complexity bound for distributionally robust Q-learning, 2023a. URL https://arxiv.org/abs/2302.13203.
  57. Sample complexity of variance-reduced distributionally robust Q-learning, 2023b.
  58. Q-learning. Machine learning, 8:279–292, 1992.
  59. Robust markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.
  60. Distributionally robust markov decision processes. In NIPS, pages 2505–2513, 2010.
  61. Improved sample complexity bounds for distributionally robust reinforcement learning, 2023.
  62. Insoon Yang. Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Transactions on Automatic Control, 66(8):3863–3870, 2021. doi: 10.1109/TAC.2020.3030884.
  63. Towards theoretical understandings of robust markov decision processes: Sample complexity and asymptotics, 2021. URL https://arxiv.org/abs/2105.03863.
  64. Avoiding model estimation in robust markov decision processes with a generative model, 2023.
  65. Data-driven hospital admission control: A learning approach. Operations Research, 2023.
  66. Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1021–1030, 2018.
  67. Finite-sample regret bound for distributionally robust offline tabular reinforcement learning. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3331–3339. PMLR, 13–15 Apr 2021. URL https://proceedings.mlr.press/v130/zhou21d.html.
Citations (14)

Summary

We haven't generated a summary for this paper yet.