Hybrid Search for Efficient Planning with Completeness Guarantees (2310.12819v2)
Abstract: Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.
- Solving the Rubik’s Cube with Deep Reinforcement Learning and Search. Nature Machine Intelligence, 1(8):356–363, 2019.
- Efficient Black-Box Planning Using Macro-Actions with Focused Effects. ArXiv preprint, abs/2004.13242, 2020. URL https://arxiv.org/abs/2004.13242.
- Decision Transformer: Reinforcement Learning via Sequence Modeling. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 15084–15097, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/7f489f642a0ddb10272b5c31057f0663-Abstract.html.
- Subgoal Search for Complex Reasoning Tasks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 624–638, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/05d8cccb5f47e5072f0a05b5f514941a-Abstract.html.
- PaLM-E: An Embodied Multimodal Language Model. ArXiv preprint, abs/2303.03378, 2023. URL https://arxiv.org/abs/2303.03378.
- Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation. ArXiv preprint, abs/1910.13395, 2019. URL https://arxiv.org/abs/1910.13395.
- Subgoal-Based Temporal Abstraction in Monte-Carlo Tree Search. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 5562–5568. ijcai.org, 2019. doi: 10.24963/ijcai.2019/772. URL https://doi.org/10.24963/ijcai.2019/772.
- Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation. In Conference on robot learning, pages 185–194. PMLR, 2017.
- A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
- Involvement of Basal Ganglia and Orbitofrontal Cortex in Goal-Directed Behavior. Progress in brain research, 126:193–215, 2000.
- Time-Agnostic Prediction: Predicting Predictable Video Frames. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=SyzVb3CcFX.
- Sub-Goal Trees - a Framework for Goal-Based Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 5020–5030. PMLR, 2020. URL http://proceedings.mlr.press/v119/jurgenson20a.html.
- Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 28336–28349, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/ee39e503b6bedf0c98c388b7e8589aca-Abstract.html.
- Variational Temporal Abstraction. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 11566–11575, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/b5d3ad899f70013367f24e0b1fa75944-Abstract.html.
- Adam: A Method for Stochastic Optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Richard E Korf. Depth-First Iterative-Deepening: an Optimal Admissible Tree Search. Artificial Intelligence, 27(1):97–109, 1985.
- Hierarchical Imitation Learning with Vector Quantized Models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 17896–17919. PMLR, 2023. URL https://proceedings.mlr.press/v202/kujanpaa23a.html.
- Conservative Q-Learning for Offline Reinforcement Learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/0d2b2061826a5df3221116a5085a6052-Abstract.html.
- Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes. ArXiv preprint, abs/2211.15144, 2022. URL https://arxiv.org/abs/2211.15144.
- Learning Plannable Representations with Causal InfoGAN. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 8747–8758, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/08aac6ac98e59e523995c161e57875f5-Abstract.html.
- Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning. ArXiv preprint, abs/2205.11790, 2022. URL https://arxiv.org/abs/2205.11790.
- Hallucinative Topological Memory for Zero-Shot Visual Planning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 6259–6270. PMLR, 2020. URL http://proceedings.mlr.press/v119/liu20h.html.
- MuZero with Self-Competition for Rate Control in VP9 Video Compression. ArXiv preprint, abs/2202.06626, 2022. URL https://arxiv.org/abs/2202.06626.
- Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=H1gzR2VKDH.
- Planning with Goal-Conditioned Policies. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 14814–14825, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/c8cc6e90ccbff44c9cee23611711cdc4-Abstract.html.
- Laurent Orseau and Levi H. S. Lelis. Policy-Guided Heuristic Search with Guarantees. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 12382–12390. AAAI Press, 2021. URL https://ojs.aaai.org/index.php/AAAI/article/view/17469.
- Single-Agent Policy Tree Search with Guarantees. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 3205–3215, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/52c5189391854c93e8a0e1326e56c14f-Abstract.html.
- Vector Quantized Models for Planning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8302–8313. PMLR, 2021. URL http://proceedings.mlr.press/v139/ozair21a.html.
- Divide-and-Conquer Monte Carlo Tree Search for Goal-Directed Planning. ArXiv preprint, abs/2004.11410, 2020. URL https://arxiv.org/abs/2004.11410.
- Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020a. URL https://proceedings.neurips.cc/paper/2020/hash/c8d3a760ebab631565f8509d84b3b3f1-Abstract.html.
- KeyIn: Keyframing for Visual Planning. Conference on Learning for Dynamics and Control, 2020b.
- Generative Language Modeling for Automated Theorem Proving. ArXiv preprint, abs/2009.03393, 2020. URL https://arxiv.org/abs/2009.03393.
- The Cross-Entropy Method: a Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, volume 133. Springer, 2004.
- Stuart J Russell. Artificial Intelligence: A Modern Approach. Pearson Education, Inc., 2010.
- Semi-Parametric Topological Memory for Navigation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=SygwwGbRW.
- Max-Philipp B. Schrader. gym-sokoban. https://github.com/mpSchrader/gym-sokoban, 2018.
- Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588(7839):604–609, 2020.
- A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play. Science, 362(6419):1140–1144, 2018.
- COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning. ArXiv preprint, abs/2010.14500, 2020. URL https://arxiv.org/abs/2010.14500.
- Neural Discrete Representation Learning. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6306–6315, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/7a98af17e63a0ac09ce2e96d03992fbc-Abstract.html.
- Don’t Start from Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning. In Conference on Robot Learning, pages 1652–1662. PMLR, 2023.
- A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE access, 8:58443–58469, 2020.
- Relational Deep Reinforcement Learning. ArXiv preprint, abs/1806.01830, 2018. URL https://arxiv.org/abs/1806.01830.
- Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search. ArXiv preprint, abs/2206.00702, 2022. URL https://arxiv.org/abs/2206.00702.
- World Model as a Graph: Learning Latent Landmarks for Planning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 12611–12620. PMLR, 2021. URL http://proceedings.mlr.press/v139/zhang21x.html.