Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards a Unified Framework for Sequential Decision Making (2310.02167v1)

Published 3 Oct 2023 in cs.AI

Abstract: In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Altman, E. 1999. Constrained Markov decision processes: stochastic modeling. Routledge.
  2. From FOND to robust probabilistic planning: Computing compact policies that bypass avoidable deadends. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 26, 65–69.
  3. Automated planning and acting. Cambridge University Press.
  4. Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 8(5-6): 359–483.
  5. On choosing and bounding probability metrics. International statistical review, 70(3): 419–435.
  6. Inductive biases for deep learning of higher-level cognition. Proceedings of the Royal Society A, 478(2266): 20210068.
  7. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
  8. Hoffmann, J. 2001. FF: The fast-forward planning system. AI magazine, 22(3): 57–57.
  9. Kolobov, A. 2012. Planning with Markov decision processes: An AI perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning.
  10. Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, 489–496.
  11. Littman, M. L. 1996. Algorithms for sequential decision-making. Brown University.
  12. The Monte Carlo Method. Journal of the American Statistical Association, 44(247): 335–341.
  13. A framework for reinforcement learning and planning. arXiv preprint arXiv:2006.15009, 127.
  14. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1): 1–118.
  15. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, 278–287. Citeseer.
  16. Russell, S. J. 2010. Artificial intelligence a modern approach. Pearson Education, Inc.
  17. Universal value function approximators. In International conference on machine learning.
  18. Reinforcement learning: An introduction. MIT press.
  19. Q-learning. Machine learning, 8: 279–292.
  20. Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, 5–32.

Summary

We haven't generated a summary for this paper yet.