Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains (2404.02499v2)

Published 3 Apr 2024 in cs.AI and cs.LG

Abstract: General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. The description logic handbook: Theory, implementation andapplications. Cambridge U.P., 2003.
  2. Handbook of Knowledge Representation, chapter DescriptionLogics. Elsevier, 2008.
  3. Transfer of deep reactive policies for mdp planning. In Advances in Neural Information Processing Systems, pages10965–10975, 2018.
  4. V. Belle and H. J. Levesque. Foundations for generalized planning in unbounded stochastic domains. In Proc. KR, pages 380–389, 2016.
  5. Fond planning for pure-past linear temporal logic goals. In ECAI 2023, pages 279–286. IOS Press, 2023.
  6. B. Bonet and H. Geffner. Features, projections, and representation change for generalizedplanning. In Proc. IJCAI, pages 4667–4673, 2018.
  7. Generalized planning: Non-deterministic abstractions and trajectoryconstraints. In Proc. IJCAI, pages 873–879, 2017.
  8. Learning features and abstract actions for computing generalizedplans. In Proc. AAAI, pages 2703–2710, 2019.
  9. Symbolic dynamic programming for first-order MDPs. In Proc. IJCAI, volume 1, pages 690–700, 2001.
  10. T. Bylander. The computational complexity of STRIPS planning. Artificial Intelligence, 69:165–204, 1994.
  11. From FOND to robust probabilistic planning: Computing compactpolicies that bypass avoidable deadends. In Proceedings of the International Conference on AutomatedPlanning and Scheduling, volume 26, pages 65–69, 2016.
  12. Towards a unified view of ai planning and reactive synthesis. In Proc. ICAPS, pages 58–67, 2019.
  13. A review of generalized planning. Knowl. Eng. Rev., 34, 2019.
  14. Babyai: A platform to study the sample efficiency of groundedlanguage learning. In ICLR, 2019.
  15. Weak, strong, and strong cyclic planning via symbolic model checking. Artificial Intelligence, 147(1):35–84, 2003.
  16. Strong Cyclic Planning Revisited. In Recent Advances in AI Planning, pages 35–48,Berlin, Heidelberg, 2000. Springer.
  17. Expressing and exploiting the common subgoal structure of classicalplanning domains using sketches. In Proc. Int. Conf. on Automated Planning and Scheduling,pages 258–268, 2021.
  18. DLPlan. https://doi.org/10.5281/zenodo.5826139, 2022.
  19. Learning sketches for decomposing planning problems into subproblemsof bounded width. In Proc. Int. Conf. on Automated Planning and Scheduling,pages 62–70, 2022.
  20. Approximate policy iteration with a policy language bias: Solvingrelational markov decision processes. JAIR, 25:75–118, 2006.
  21. Learning general planning policies from small examples withoutsupervision. In Proc. AAAI, pages 11801–11808, 2021.
  22. An introduction to deep reinforcement learning. Found. Trends. Mach. Learn., 2018.
  23. Simple and fast strong cyclic planning for fully-observablenondeterministic planning problems. In Proc. IJCAI, 2011.
  24. Potassco: The Potsdam Answer Set Solving Collection. AI Communications, 24(2):107–124, June 2011.
  25. H. Geffner and B. Bonet. A Concise Introduction to Models and Methods for AutomatedPlanning. Morgan & Claypool Publishers, 2013.
  26. Compact policies for fully observable non-deterministic planning assat. In Proceedings of the International Conference on AutomatedPlanning and Scheduling, volume 28, pages 88–96, 2018.
  27. Automated planning and acting. Cambridge U.P., 2016.
  28. Deep learning. MIT Press, 2016.
  29. Learning generalized reactive policies using deep neural networks. In Proc. ICAPS, 2018.
  30. An Introduction to the Planning Domain Definition Language. Morgan & Claypool Publishers, 2019.
  31. Y. Hu and G. De Giacomo. Generalized planning: Synthesizing plans that work for multipleenvironments. In Proc. IJCAI, pages 918–923, 2011.
  32. L. Illanes and S. A. McIlraith. Generalized planning via abstraction: arbitrary numbers of objects. In Proc. AAAI, 2019.
  33. R. Khardon. Learning action strategies for planning domains. Artificial Intelligence, 113:125–148, 1999.
  34. P. Kissmann and S. Edelkamp. Solving fully-observable non-deterministic planning problems viatranslation into a general game. KI 2009: Advances in AI, pages 1–8, 2009.
  35. Sixthsense: Fast and reliable recognition of dead ends in mdps. In Proc. AAAI, pages 1108–1114, 2010.
  36. Using classical planners to solve nondeterministic planning problems. In Proc. ICAPS, pages 190–197, 2008.
  37. Vladimir Lifschitz. Answer Sets and the Language of Answer Set Programming. AI Magazine, 37(3):7–12, October 2016. https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/2670.
  38. Traps, invariants, and dead-ends. In Proceedings of the International Conference on AutomatedPlanning and Scheduling, volume 26, pages 211–215, 2016.
  39. The computational complexity of probabilistic planning. Journal of Artificial Intelligence Research, 9:1–36, 1998.
  40. M. Martín and H. Geffner. Learning generalized policies from planning examples using conceptlanguages. Applied Intelligence, 20(1):9–19, 2004.
  41. Pattern database heuristics for fully observable nondeterministicplanning. In Proc. ICAPS, 2010.
  42. Improved non-deterministic planning by exploiting state relevance. In ICAPS, 2012.
  43. Fair ltl synthesis for non-deterministic systems using strong cyclicplanners. In Twenty-Third International Joint Conference on ArtificialIntelligence, 2013.
  44. Iterative depth-first search for fond planning. In Proceedings of the International Conference on AutomatedPlanning and Scheduling, volume 32, pages 90–99, 2022.
  45. M. Ramirez and S. Sardina. Directed fixed-point regression-based planning for non-deterministicdomains. In Proc. ICAPS, 2014.
  46. J. Rintanen. Complexity of planning with partial observability. In Proc. ICAPS-2004, pages 345–354, 2004.
  47. Generalized planning with deep reinforcement learning. arXiv preprint arXiv:2005.02305, 2020.
  48. S. Sanner and C. Boutilier. Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5-6):748–788, 2009.
  49. Correlation complexity of classical planning domains. In Proceedings of the Twenty-Fifth International JointConference on Artificial Intelligence (IJCAI), pages 3242–3250,New York, New York, USA, July 2016. AAAI Press.
  50. Learning generalized plans using abstract counting. In Proc. AAAI, pages 991–997, 2008.
  51. A new representation and associated algorithms for generalizedplanning. Artificial Intelligence, 175(2):615–647, 2011.
  52. Qualitative numeric planning. In AAAI, 2011.
  53. Learning generalized unsolvability heuristics for classical planning. In IJCAI, pages 4175–4181, 2021.
  54. Learning general optimal policies with graph neural networks:Expressive power, transparency, and limits. In Proc. ICAPS, pages 629–637, 2022.
  55. Learning generalized policies without supervision using GNNs. In Proc. KR, pages 474–483, 2022.
  56. Learning general policies with policy gradient methods. In Proceedings of the International Conference on Principles ofKnowledge Representation and Reasoning, pages 647–657, 2023.
  57. State space search nogood learning: Online refinement ofcritical-path dead-end detectors in planning. Artificial Intelligence, 245:1–37, 2017.
  58. R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.
  59. Incremental plan aggregation for generating policies in mdps. In Proc. AAMAS, pages 1231–1238, 2010.
  60. Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research, 68:1–68, 2020.
  61. M van Otterlo. Solving relational and first-order logical markov decision processes:A survey. In M. Wiering and M. van Otterlo, editors, ReinforcementLearning, pages 253–292. Springer, 2012.
  62. First order decision diagrams for relational MDPs. Journal of Artificial Intelligence Research, 31:431–472, 2008.
  63. FF-replan: A baseline for probabilistic planning. In Proc. ICAPS-07, pages 352–359, 2007.
Citations (2)

Summary

We haven't generated a summary for this paper yet.