Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (2402.14083v2)
Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A*$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A*$ implementation that was used for training initially. In our training method, $A*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$\times$ smaller model size and a 10$\times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.
- Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023.
- Making neural programming architectures generalize via recursion. arXiv preprint arXiv:1704.06611, 2017.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- Decision transformer: Reinforcement learning via sequence modeling, 2021.
- Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023.
- Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006.
- Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of artificial intelligence research, 13:227–303, 2000.
- Accessing higher-level representations in sequential transformers with feedback memory. CoRR, abs/2002.09402, 2020. https://arxiv.org/abs/2002.09402.
- Emu video: Factorizing text-to-video generation by explicit image conditioning. arXiv preprint arXiv:2311.10709, 2023.
- Deep learning. MIT press, 2016.
- Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- Learning to search with mctsnets. In International conference on machine learning, pages 1822–1831. PMLR, 2018.
- Reinforced self-training (rest) for language modeling, 2023.
- Deep hierarchical planning from pixels. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. https://openreview.net/forum?id=wZk69kjy9_d.
- Large language models cannot self-correct reasoning yet, 2023.
- Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
- In-context reinforcement learning with algorithm distillation. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=hy0a5MMPUv.
- SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. http://arxiv.org/abs/1608.03983.
- Decoupled weight decay regularization, 2019.
- Evaluating cognitive maps and planning in large language models with cogeval, 2023.
- MongoDB Inc. MongoDB. https://www.mongodb.com/. Accessed: 2024-01-23.
- Show your work: Scratchpads for intermediate computation with language models, 2021.
- OpenAI. Openai codex, 2021. https://openai.com/blog/openai-codex.
- OpenAI. Openai: Introducing chatgpt, 2022. https://openai.com/blog/chatgpt.
- OpenAI. Gpt-4 technical report, 2023.
- DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. https://openreview.net/forum?id=a68SUt6zFt.
- Plansformer: Generating symbolic plans using transformers, 2022.
- Pytorch: An imperative style, high-performance deep learning library, 2019.
- Yarn: Efficient context window extension of large language models. arXiv preprint arXiv:2309.00071, 2023.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 18–24 Jul 2021. https://proceedings.mlr.press/v139/radford21a.html.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. http://jmlr.org/papers/v21/20-074.html.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
- Grandmaster-level chess without search, 2024.
- Artificial Intelligence: A Modern Approach. Pearson Education, 4 edition, 2021.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. ArXiv, abs/2208.03188, 2022. https://api.semanticscholar.org/CorpusID:251371589.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. 10.1126/science.aar6404. https://www.science.org/doi/abs/10.1126/science.aar6404.
- Make-a-video: Text-to-video generation without text-video data. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=nJfylDvgzlq.
- Roformer: Enhanced transformer with rotary position embedding, 2023.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Reward-respecting subtasks for model-based reinforcement learning. Artificial Intelligence, 324:104001, 2023.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024. 10.1038/s41586-023-06747-5. https://doi.org/10.1038/s41586-023-06747-5.
- On the planning abilities of large language models – a critical investigation, 2023a.
- Large language models still can’t plan (a benchmark for llms on planning and reasoning about change), 2023b.
- Attention is all you need. CoRR, abs/1706.03762, 2017. http://arxiv.org/abs/1706.03762.
- Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022.
- Bayesian relational memory for semantic visual navigation, 2019.
- Chain of thought imitation with procedure cloning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36366–36381. Curran Associates, Inc., 2022.
- Tree of thoughts: Deliberate problem solving with large language models, 2023.