Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach (2402.19265v1)
Abstract: Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).
- POMCP-based decentralized spatial task allocation algorithms for partially observable environments. Applied Intelligence, 53(10), 12613–12631.
- Inductive synthesis of finite-state controllers for POMDPs. In Uncertainty in Artificial Intelligence, pp. 85–95. PMLR.
- Balduccini, M. (2007). Learning Action Descriptions with A-Prolog: Action Language C. In AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning.
- Parallelizing POMCP to solve complex POMDPs. In Rss workshop on software tools for real-time optimal control.
- Multiagent rollout and policy iteration for POMDP with application to multi-robot repair problems. In Conference on Robot Learning, pp. 1814–1828. PMLR.
- Learning First-Order Symbolic Representations for Planning from the Structure of the State Space. In ECAI 2020, pp. 2322–2329. IOS Press.
- The use of UAV s in humanitarian relief: An application of POMDP-based methodology for finding victims. Production and Operations Management, 28(2), 421–440.
- Closing the Planning–Learning Loop With Application to Autonomous Driving. IEEE Transactions on Robotics, 39(2), 998–1011.
- ASP-Core-2 input language format. Theory and Practice of Logic Programming, 20(2), 294–309.
- Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, pp. 54–61.
- Cassandra, A. R. (1998). A survey of POMDP applications. In Working notes of AAAI 1998 fall symposium on planning with partially observable Markov decision processes, Vol. 1724.
- Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In IJCAI 2019, Macao, China, August 10-16, 2019, pp. 5540–5546. ijcai.org.
- Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation. Engineering Applications of Artificial Intelligence, 104, 104382.
- Heuristic-guided reinforcement learning. Advances in Neural Information Processing Systems, 34, 13550–13563.
- Learning programs by learning from failures. Machine Learning, 110, 801–856.
- LEADER: Learning Attention over Driving Behaviors for Planning under Uncertainty. In Conference on Robot Learning, pp. 199–211. PMLR.
- Towards an inductive logic programming approach for explaining black-box preference learning systems. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, Vol. 17, pp. 855–859.
- Imitation learning over heterogeneous agents with restraining bolts. In Proceedings of the international conference on automated planning and scheduling, Vol. 30, pp. 517–521.
- Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Proceedings of the international conference on automated planning and scheduling, Vol. 29, pp. 128–136.
- Scaling up heuristic planning with relational decision trees. Journal of Artificial Intelligence Research, 40, 767–813.
- Applications of ASP in robotics. KI-Künstliche Intelligenz, 32(2-3), 143–149.
- Induction and exploitation of subgoal automata for reinforcement learning. Journal of Artificial Intelligence Research, 70, 1031–1116.
- Gil, Y. (1994). Learning by Experimentation: Incremental Refinement of Incomplete Planning Domains. In International Conference on Machine Learning, pp. 87–95 New Brunswick, USA.
- Autonomous task planning and situation awareness in robotic surgery. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3144–3150. IEEE.
- POMP++: Pomcp-based Active Visual Search in unknown indoor environments. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1523–1530. IEEE.
- Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 562–570.
- Complete bottom-up predicate invention in meta-interpretive learning. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 2312–2318.
- Planning and Acting in Partially Observable Stochastic Domains. Artif. Intell., 101(1–2), 99–134.
- Symbols as a lingua franca for bridging human-ai chasm for explainable and advisable ai systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, pp. 12262–12267.
- Learning in POMDPs with Monte Carlo tree search. In International Conference on Machine Learning, pp. 1819–1827. PMLR.
- Bandit Based Monte-Carlo Planning. In Proc. ECML’06, pp. 282–293 Berlin, Heidelberg. Springer-Verlag.
- Law, M. (2023). Conflict-driven inductive logic programming. Theory and Practice of Logic Programming, 23(2), 387–414.
- Search Space Expansion for Efficient Incremental Inductive Logic Programming from Streamed Data. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 2697–2704. International Joint Conferences on Artificial Intelligence Organization.
- Iterative learning of answer set programs from context dependent examples. Theory and Practice of Logic Programming, 16(5-6), 834–848.
- The complexity and generality of learning answer set programs. Artificial Intelligence, 259, 110–146.
- MAGIC: Learning Macro-Actions for Online POMDP Planning . In Proceedings of Robotics: Science and Systems Virtual.
- Automatic generation and learning of finite-state controllers. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp. 135–144. Springer.
- A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artificial Intelligence, 241, 103–130.
- Lifschitz, V. (1999). Answer set planning. In International Conference on Logic Programming and Nonmonotonic Reasoning, pp. 373–374. Springer.
- Computing contingent plan graphs using online planning. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 16(1), 1–30.
- Conflict-driven clause learning SAT solvers. In Handbook of satisfiability, pp. 133–182. IOS press.
- Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), p. 889–897. International Foundation for Autonomous Agents and Multiagent Systems.
- Rule-based Shielding for Partially Observable Monte-Carlo Planning. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), Vol. 31, pp. 243–251.
- Risk-aware shielding of Partially Observable Monte Carlo Planning policies. Artificial Intelligence, 324, 103987.
- Learning Logic Specifications for Soft Policy Guidance in POMCP. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 373–381.
- Towards inductive learning of surgical task knowledge: A preliminary case study of the peg transfer task. Procedia Computer Science, 176, 440–449.
- Logic programming for deliberative robotic task planning. Artificial Intelligence Review, 56(6), 9011–9049.
- Inductive learning of answer set programs for autonomous surgical task planning. Machine Learning, 110(7), 1739–1763.
- Autonomous tissue retraction with a biomechanically informed logic based framework. In 2021 International Symposium on Medical Robotics (ISMR), pp. 1–7. IEEE.
- Axiom Learning and Belief Tracing for Transparent Decision Making in Robotics. In AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction: Trust and Explainability in Artificial Intelligence for Human-Robot Interaction.
- Muggleton, S. (1991). Inductive logic programming. New generation computing, 8(4), 295–318.
- POMDP planning under object composition uncertainty: Application to robotic manipulation. IEEE Transactions on Robotics, 39(1), 41–56.
- The complexity of Markov decision processes. Mathematics of operations research, 12(3), 441–450.
- An online POMDP algorithm for complex multiagent environments. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, pp. 970–977.
- Explaining black-box classifiers with ILP–empowering LIME with Aleph to approximate non-linear decisions with relational rules. In International Conference on Inductive Logic Programming, pp. 105–117. Springer.
- Learning First-Order Representations for Planning from Black Box States: New Results. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, Vol. 18, pp. 539–548.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 0, 0.
- Shani, G. (2013). Task-based decomposition of factored POMDPs. IEEE transactions on cybernetics, 44(2), 208–216.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140–1144.
- Monte-Carlo planning in large POMDPs. Advances in neural information processing systems, 23.
- REBA: A refinement-based architecture for knowledge representation and reasoning in robotics. Journal of Artificial Intelligence Research, 65, 87–180.
- Knowledge Representation and Interactive Learning of Domain Knowledge for Human-Robot Collaboration. Advances in Cognitive Systems, 7, 77–96.
- Waterline and Obstacle Detection in Images from Low-Cost Autonomous Boats for Environmental Monitoring. Robot. Auton. Syst., 124(C).
- Approximate information state for approximate planning and reinforcement learning in partially observed systems. The Journal of Machine Learning Research, 23(1), 483–565.
- Online algorithms for POMDPs with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 28, pp. 259–263.
- Deliberation in autonomous robotic surgery: a framework for handling anatomical uncertainty. In 2022 IEEE International Conference on Robotics and Automation (ICRA 2022).
- Review of mission planning for autonomous marine vehicle fleets. Journal of Field Robotics, 36(2), 333–354.
- Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research, 68, 1–68.
- Answer Set Planning: A Survey. Theory and Practice of Logic Programming, 23(1), 226–298.
- Inductive Logic Programming For Transparent Alignment With Multiple Moral Values.. 2nd International Workshop on Emerging Ethical Aspects of AI (BEWARE-23).
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
- POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments. In Proc. of British Machine Vision Conference (BMVC). BMVA Press.
- Online partial conditional plan synthesis for POMDPs with safe-reachability objectives: Methods and experiments. IEEE Transactions on Automation Science and Engineering, 18(3), 932–945.
- Adaptive online packing-guided search for POMDPs. Advances in Neural Information Processing Systems, 34, 28419–28430.
- DESPOT: Online POMDP planning with regularization. Journal of Artificial Intelligence Research, 58, 231–266.
- Probabilistic planning via determinization in hindsight. In Proceedings of the 23rd national conference on Artificial intelligence-Volume 2, pp. 1010–1016.
- Learning State-Variable Relationships for Improving POMCP Performance. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC, pp. 739–747. Association for Computing Machinery.
- Learning State-Variable Relationships in POMCP: A Framework for Mobile Robots. Frontiers in Robotics and AI, 9.
- Daniele Meli (15 papers)
- Alberto Castellini (13 papers)
- Alessandro Farinelli (41 papers)