Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning (2306.11208v1)

Published 20 Jun 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, pp. 10–4, 2019.
  2. Discount factor as a regularizer in reinforcement learning. In International conference on machine learning, pp. 269–278. PMLR, 2020.
  3. Mitigating planner overfitting in model-based reinforcement learning. arXiv preprint arXiv:1812.01129, 2018.
  4. A bayesian sampling approach to exploration in reinforcement learning. arXiv preprint arXiv:1205.2664, 2012.
  5. Vacsim: Learning effective strategies for covid-19 vaccine distribution using reinforcement learning. Intelligence-Based Medicine, pp.  100060, 2022.
  6. Bandit algorithms to personalize educational chatbots. Machine Learning, 110(9):2389–2418, 2021.
  7. Duff, M. O. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. University of Massachusetts Amherst, 2002.
  8. Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis. In Machine learning for healthcare conference, pp.  67–82. PMLR, 2018.
  9. Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 8(5-6):359–483, 2015.
  10. Adaptive filtering prediction and control,(book) prentice-hall. Englewood Cliffs, 6(7):45, 1984.
  11. Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions. In International Conference on Machine Learning, pp. 3658–3667. PMLR, 2020.
  12. The value equivalence principle for model-based reinforcement learning. Advances in Neural Information Processing Systems, 33:5541–5552, 2020.
  13. Proper value equivalence. Advances in Neural Information Processing Systems, 34:7773–7786, 2021.
  14. Approximate value equivalence. Advances in Neural Information Processing Systems, 35:33029–33040, 2022.
  15. The dependence of effective planning horizon on model accuracy. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp.  1181–1189. Citeseer, 2015.
  16. Near-bayesian exploration in polynomial time. In Proceedings of the 26th annual international conference on machine learning, pp.  513–520, 2009.
  17. Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1):1–22, 2020.
  18. Murphy, K. P. Machine learning: a probabilistic perspective. MIT press, 2012.
  19. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp.  278–287, 1999.
  20. Making sense of reinforcement learning and probabilistic inference. arXiv preprint arXiv:2001.00805, 2020.
  21. Reinforcement learning-based expanded personalized diabetes treatment recommendation using south korean electronic health records. Expert Systems with Applications, 206:117932, 2022.
  22. (more) efficient reinforcement learning via posterior sampling. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/6a5889bb0190d0211a991f47bb19a777-Paper.pdf.
  23. Pitis, S. Rethinking the discount factor in reinforcement learning: A decision theoretic approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  7949–7956, 2019.
  24. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.
  25. An analytic solution to discrete bayesian reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pp.  697–704, 2006.
  26. Bandit learning with implicit feedback. Advances in Neural Information Processing Systems, 31, 2018.
  27. A tumor growth inhibition model for low-grade glioma treated with chemotherapy or radiotherapya tumor growth inhibition model for low-grade glioma. Clinical Cancer Research, 18(18):5071–5080, 2012.
  28. Bayes-adaptive pomdps. Advances in neural information processing systems, 20, 2007.
  29. A bayesian approach for learning and planning in partially observable markov decision processes. Journal of Machine Learning Research, 12(5), 2011.
  30. Variance-based rewards for approximate bayesian reinforcement learning. arXiv preprint arXiv:1203.3518, 2012.
  31. Scaling reinforcement learning toward robocup soccer. In Icml, volume 1, pp.  537–544, 2001.
  32. Strens, M. A bayesian framework for reinforcement learning. In ICML, volume 2000, pp.  943–950, 2000.
  33. Reinforcement learning: An introduction, pp.  113. MIT press, 2018.
  34. Designing reinforcement learning algorithms for digital interventions: pre-implementation guidelines. Algorithms, 15(8):255, 2022.
  35. Bayesian reinforcement learning. Reinforcement learning, pp.  359–386, 2012.
  36. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Operations Research Letters, 39(5):369–374, 2011.
  37. White, M. Unifying task specification in reinforcement learning. In International Conference on Machine Learning, pp. 3742–3750. PMLR, 2017.
  38. Reinforcement learning with state-dependent discount factor. In 2013 IEEE third joint international conference on development and learning and epigenetic robotics (ICDL), pp.  1–6. IEEE, 2013.
  39. Problem dependent reinforcement learning bounds which can identify bandit structure in mdps. In International Conference on Machine Learning, pp. 5747–5755. PMLR, 2018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com