Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search (2206.00702v10)
Abstract: Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly, allowing to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longer subgoals and the fine control with the shorter ones, and thus scales well to difficult planning problems. We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and inequality proving benchmark INT.
- Solving the rubik’s cube with deep reinforcement learning and search. Nature Machine Intelligence, 1(8):356–363, 2019.
- Do as i can, not as i say: Grounding language in robotic affordances, 2022. URL https://arxiv.org/abs/2204.01691.
- Efficient black-box planning using macro-actions with focused effects. IJCAI-21, 2021.
- What matters in on-policy reinforcement learning? A large-scale empirical study. CoRR, abs/2006.05990, 2020. URL https://arxiv.org/abs/2006.05990.
- A comparison of abstraction heuristics for rubik’s cube. In ICAPS 2022 Workshop on Heuristics and Search for Domain-independent Planning, 2022.
- Rich Caruana. Multitask learning. Springer, 1998.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Introduction to algorithms. MIT press, 2009.
- Subgoal search for complex reasoning tasks. Advances in Neural Information Processing Systems, 34:624–638, 2021.
- RL2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Dynamics learning with cascaded variational inference for multi-step manipulation. arXiv preprint arXiv:1910.13395, 2019.
- The first learning track of the international planning competition. Machine Learning, 84(1-2):81–107, 2011.
- Maximilian Fickert. Adaptive search techniques in ai planning and heuristic search. 2022.
- Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pp. 185–194. PMLR, 2017. URL http://proceedings.mlr.press/v78/gao17a.html.
- An investigation of model-free planning. In International Conference on Machine Learning, pp. 2464–2473. PMLR, 2019.
- Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006.
- Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 3796–3803, 2019.
- Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior. Progress in brain research, 126:193–215, 2000.
- Time-agnostic prediction: Predicting predictable video frames. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=SyzVb3CcFX.
- Sub-goal trees a framework for goal-based reinforcement learning. In International Conference on Machine Learning, pp. 5020–5030. PMLR, 2020.
- Leslie Pack Kaelbling. Learning to achieve goals. In Ruzena Bajcsy (ed.), Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France, August 28 - September 3, 1993, pp. 1094–1099. Morgan Kaufmann, 1993.
- Variational temporal abstraction. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 11566–11575, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/b5d3ad899f70013367f24e0b1fa75944-Abstract.html.
- Real-time adaptive a. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pp. 281–288, 2006.
- Richard E Korf. Finding optimal solutions to rubik’s cube using pattern databases. In AAAI/IAAI, pp. 700–705, 1997.
- Learning plannable representations with causal infogan. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 8747–8758, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/08aac6ac98e59e523995c161e57875f5-Abstract.html.
- Width and serialization of classical planning problems. In ECAI 2012, pp. 540–545. IOS Press, 2012.
- Hallucinative topological memory for zero-shot visual planning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 6259–6270. PMLR, 2020a. URL http://proceedings.mlr.press/v119/liu20h.html.
- Multilingual denoising pre-training for neural machine translation. CoRR, abs/2001.08210, 2020b. URL https://arxiv.org/abs/2001.08210.
- Muzero with self-competition for rate control in vp9 video compression. arXiv preprint arXiv:2202.06626, 2022.
- PDDL - The Planning Domain Definition Language, 1998.
- Uncertainty-sensitive learning and planning with ensembles. arXiv preprint arXiv:1912.09996, 2019.
- Solving the rubik’s cube with a pddl planner. 2022.
- Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=H1gzR2VKDH.
- Divide-and-conquer monte carlo tree search for goal-directed planning. arXiv preprint arXiv:2004.11410, 2020.
- Judea Pearl. Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley Longman Publishing Co., Inc., 1984.
- Long-horizon visual planning with goal-conditioned hierarchical predictors. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020a. URL https://proceedings.neurips.cc/paper/2020/hash/c8d3a760ebab631565f8509d84b3b3f1-Abstract.html.
- Keyframing the future: Keyframe discovery for visual prediction and planning. In Alexandre M. Bayen, Ali Jadbabaie, George J. Pappas, Pablo A. Parrilo, Benjamin Recht, Claire J. Tomlin, and Melanie N. Zeilinger (eds.), Proceedings of the 2nd Annual Conference on Learning for Dynamics and Control, L4DC 2020, Online Event, Berkeley, CA, USA, 11-12 June 2020, volume 120 of Proceedings of Machine Learning Research, pp. 969–979. PMLR, 2020b. URL http://proceedings.mlr.press/v120/pertsch20a.html.
- Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393, 2020.
- The lama planner: Guiding cost-based anytime planning with landmarks. Journal of Artificial Intelligence Research, 39:127–177, 2010.
- Artificial intelligence: A modern approach. ed. 3. 2010.
- Semi-parametric topological memory for navigation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=SygwwGbRW.
- Mastering atari, go, chess and shogi by planning with a learned model. ArXiv, abs/1911.08265, 2019.
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. ArXiv, abs/1712.01815, 2017.
- Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- INT: an inequality benchmark for evaluating generalization in theorem proving. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=O6LPudowNQm.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp. 1094–1100. PMLR, 2020.
- World model as a graph: Learning latent landmarks for planning. CoRR, abs/2011.12491, 2020. URL https://arxiv.org/abs/2011.12491.