Stream of Search (SoS): Learning to Search in Language (2404.03683v1)

Published 1 Apr 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how LLMs can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based LLM from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that LLMs can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

References (28)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces the innovative Stream of Search (SoS) framework that integrates search operations in a unified language to enhance learning and problem-solving.
The paper demonstrates a 25% accuracy improvement on the Countdown problem when models are trained on diverse search trajectories instead of optimal paths.
The paper employs policy improvement techniques like APA and STaR, enabling models to solve an additional 36% of challenging problems through self-improvement.

Exploring the Stream of Search: A Framework for Learning to Search within LLMs

Introduction to Stream of Search

The essence of problem-solving through LLMs often neglects an important aspect integral to human learning and creativity: the ability to explore, make mistakes, and learn from them. Most models are trained on a clean, mistake-free data diet, limiting their ability to preemptively recognize errors or explore alternative solutions. This paper proposes a novel framework, Stream of Search (SoS), which encapsulates the idea of teaching LLMs the art of search and backtracking through a unified language. The method is demonstrated on the Countdown problem, showcasing a significant improvement in solving capabilities over models trained solely on optimal paths.

Unified Language for Search

At the heart of the SoS framework is the systematic representation of search strategies in a unified language. This encompasses key search operations like exploration, backtracking, pruning, and more. By embodying these operations in language, the paper opens the door to training models that can autonomously navigate through problem spaces, engaging with different strategies and potentially inventing new ones. Such a unified language for search not only enhances a model's problem-solving toolkit but also enriches its ability to think and learn in a more human-like manner.

Training and Evaluation

The Countdown problem, chosen for its combination of simplicity and complexity, served as the proving ground for SoS. A transformer-based model was trained from scratch on a dataset of search trajectories generated by heuristic solvers employing diverse strategies. Remarkably, compared to models trained on optimal paths, the SoS pretraining demonstrated a 25% increase in search accuracy. This improvement not only speaks to the efficacy of exposing models to the intricate process of search and decision-making but also highlights the potential for models to self-improve and adapt over time.

Policy Improvement Techniques

Building upon SoS, the paper investigates the model's capacity for self-improvement through two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The resulting finetuned models exhibited an impressive ability to solve an additional 36% of problems previously unsolved, including those beyond the reach of the heuristic solvers used for initial training. This achievement underscores the profound potential of LLMs to transcend their initial training limits through the strategic application of policy improvement methods.

Implications and Future Directions

The Stream of Search framework reshapes our understanding of the capabilities of LLMs in planning, problem-solving, and learning. By training models to engage with the messy, exploratory process of search, we can unlock more dynamic and versatile problem-solving abilities. This research not only provides a tangible step towards equipping LMs with the tools for internal search and discovery but also lays the groundwork for future developments in AI that can learn, adapt, and innovate in more human-like ways.

Furthermore, the implications for practical applications are vast, ranging from enhanced problem-solving in specific domains to the development of more generalized AI capable of tackling a broader spectrum of challenges. As we continue to push the boundaries of what LLMs can achieve, frameworks like SoS will be instrumental in guiding their evolution towards more sophisticated and creative forms of intelligence.

Concluding Thoughts

The Stream of Search framework marks a significant advancement in the field of LLM research.

By embedding the intricacies of search within the language, the framework opens new avenues for models to learn, grow, and innovate. As we look towards the future, the potential for models trained under this framework to discover entirely new search strategies or solve problems that have long evaded algorithmic solutions serves as a testament to the untapped potential residing within LLMs.

PDF Markdown

Related Papers

GitHub

GitHub - kanishkg/stream-of-search (149 stars)

Tweets

https://twitter.com/gandhikanishk/status/1777358353045622891

https://twitter.com/_philschmid/status/1847565606964646077

https://twitter.com/IntuitMachine/status/1782367484089151879

https://twitter.com/noahdgoodman/status/1777409751326081176

https://twitter.com/arankomatsuzaki/status/1777146780099940688

https://twitter.com/Dahoas1/status/1783185382185533677