Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (2402.14083v2)

Published 21 Feb 2024 in cs.AI

Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A*$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A*$ implementation that was used for training initially. In our training method, $A*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$\times$ smaller model size and a 10$\times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023.
  2. Making neural programming architectures generalize via recursion. arXiv preprint arXiv:1704.06611, 2017.
  3. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  4. Decision transformer: Reinforcement learning via sequence modeling, 2021.
  5. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023.
  6. Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006.
  7. Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of artificial intelligence research, 13:227–303, 2000.
  8. Accessing higher-level representations in sequential transformers with feedback memory. CoRR, abs/2002.09402, 2020. https://arxiv.org/abs/2002.09402.
  9. Emu video: Factorizing text-to-video generation by explicit image conditioning. arXiv preprint arXiv:2311.10709, 2023.
  10. Deep learning. MIT press, 2016.
  11. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  12. Learning to search with mctsnets. In International conference on machine learning, pages 1822–1831. PMLR, 2018.
  13. Reinforced self-training (rest) for language modeling, 2023.
  14. Deep hierarchical planning from pixels. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. https://openreview.net/forum?id=wZk69kjy9_d.
  15. Large language models cannot self-correct reasoning yet, 2023.
  16. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
  17. In-context reinforcement learning with algorithm distillation. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=hy0a5MMPUv.
  18. SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. http://arxiv.org/abs/1608.03983.
  19. Decoupled weight decay regularization, 2019.
  20. Evaluating cognitive maps and planning in large language models with cogeval, 2023.
  21. MongoDB Inc. MongoDB. https://www.mongodb.com/. Accessed: 2024-01-23.
  22. Show your work: Scratchpads for intermediate computation with language models, 2021.
  23. OpenAI. Openai codex, 2021. https://openai.com/blog/openai-codex.
  24. OpenAI. Openai: Introducing chatgpt, 2022. https://openai.com/blog/chatgpt.
  25. OpenAI. Gpt-4 technical report, 2023.
  26. DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. https://openreview.net/forum?id=a68SUt6zFt.
  27. Plansformer: Generating symbolic plans using transformers, 2022.
  28. Pytorch: An imperative style, high-performance deep learning library, 2019.
  29. Yarn: Efficient context window extension of large language models. arXiv preprint arXiv:2309.00071, 2023.
  30. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 18–24 Jul 2021. https://proceedings.mlr.press/v139/radford21a.html.
  31. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. http://jmlr.org/papers/v21/20-074.html.
  32. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  33. Grandmaster-level chess without search, 2024.
  34. Artificial Intelligence: A Modern Approach. Pearson Education, 4 edition, 2021.
  35. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  36. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. ArXiv, abs/2208.03188, 2022. https://api.semanticscholar.org/CorpusID:251371589.
  37. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  38. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. 10.1126/science.aar6404. https://www.science.org/doi/abs/10.1126/science.aar6404.
  39. Make-a-video: Text-to-video generation without text-video data. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=nJfylDvgzlq.
  40. Roformer: Enhanced transformer with rotary position embedding, 2023.
  41. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  42. Reward-respecting subtasks for model-based reinforcement learning. Artificial Intelligence, 324:104001, 2023.
  43. Llama 2: Open foundation and fine-tuned chat models, 2023.
  44. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024. 10.1038/s41586-023-06747-5. https://doi.org/10.1038/s41586-023-06747-5.
  45. On the planning abilities of large language models – a critical investigation, 2023a.
  46. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change), 2023b.
  47. Attention is all you need. CoRR, abs/1706.03762, 2017. http://arxiv.org/abs/1706.03762.
  48. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022.
  49. Bayesian relational memory for semantic visual navigation, 2019.
  50. Chain of thought imitation with procedure cloning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36366–36381. Curran Associates, Inc., 2022.
  51. Tree of thoughts: Deliberate problem solving with large language models, 2023.
Citations (35)

Summary

  • The paper introduces Searchformer, a novel Transformer model that bootstraps A* search dynamics to achieve superior planning performance.
  • It demonstrates a 93.7% success rate in solving Sokoban puzzles while reducing search steps by 26.8% compared to traditional methods.
  • The study bridges deep learning and symbolic planning, paving the way for more efficient decision-making in robotics, logistics, and complex applications.

Enhancing Transformer Efficiency in Planning Tasks through Search Dynamics Bootstrapping

Introduction to Searchformer

Recent advancements in deep learning and, particularly, Transformer architectures have markedly impacted various fields, from natural language processing to computer vision. However, these models often encounter difficulties in complex decision-making tasks when compared to traditional symbolic planners. Our focus is on a novel approach that leverages the strengths of Transformers for elaborate planning tasks, which are traditionally out of reach for such models. We introduce Searchformer, a Transformer-based model, that not only competes with but also surpasses the A* search algorithm in solving complex Sokoban puzzles with more efficiency.

Methodology

Searchformer is built upon an encoder-decoder Transformer architecture, initially trained to imitate the search dynamics of the A* search algorithm. It is subsequently fine-tuned through expert iterations, thereby improving its search efficiency beyond that of its symbolic planning counterpart. This process involves generating a synthetic dataset by simulating the A*search, where the search dynamics are recorded as token sequences. These sequences intricately describe the adding and removing of task states into a search tree. Crucially, this method allows the Searchformer to be initially versed in traditional search techniques and then transcend them by optimizing the search steps required to achieve an optimal plan.

Experimentation and Results

Our experiments were conducted across two main domains: maze navigation tasks and Sokoban puzzles. The results indicate a remarkable capability of Searchformer to outdo the efficiency of A* search. Notably, in solving unseen Sokoban puzzles, Searchformer demonstrated a 93.7% success rate while reducing the search steps by an average of 26.8%. These improvements were achieved through a methodical bootstrapping process, where the Transformer model is iteratively fine-tuned to refine search efficiency while maintaining or enhancing solution accuracy. Moreover, our ablation studies further affirm the importance of including search dynamics in training data, highlighting that models trained with search-augmented sequences drastically outperform their solution-only counterparts, particularly in low-data regimes or when tackling more complex tasks.

Theoretical and Practical Implications

This work underscores the potential of Transformer models in solving complex decision-making tasks, a domain traditionally dominated by symbolic planning algorithms. From a theoretical standpoint, it bridges the gap between deep learning and symbolic planning methodologies, proving that with adequate training, Transformers can internalize and evolve beyond established search algorithms. Practically, this research paves the way for more efficient automated planning systems, potentially benefiting a wide range of applications, from robotics to logistics and beyond. Furthermore, the efficiency gains in search dynamics could lead to significant computational cost reductions, making sophisticated planning solutions more accessible.

Future Directions and Broader Impact

While the current implementation of Searchformer marks a significant advancement, there is room for exploration. Future work could delve into curriculum learning strategies or integrate hierarchical planning methods to further enhance model efficiency and capability. Addressing these challenges could widen the applicability of AI in scenarios requiring complex decision-making under constraints.

The broader impact of this work is multifaceted, potentially influencing both academic research directions and real-world applications in industries reliant on planning and scheduling. Nevertheless, it's critical to continue assessing the ethical implications and ensuring these advancements contribute positively to society.

Acknowledgements

This section acknowledges contributions from individuals who provided insights or feedback throughout the research process, highlighting the collaborative nature of this endeavor.

Conclusion

Searchformer represents a significant step toward harnessing the power of Transformer models for complex planning tasks, showcasing the ability to learn and improve upon traditional planning algorithms. This work not only challenges the current limitations of deep learning models in strategic decision-making but also opens new avenues for AI research and application in areas where efficient planning is crucial.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

Reddit Logo Streamline Icon: https://streamlinehq.com