ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent (2312.10003v1)
Abstract: Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a LLM to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Vladimir Blagojevic. Long-form qa beyond eli5: an updated dataset and approach, 2022. URL towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb.
- Harrison Chase. Langchain. https://github.com/hwchase17/langchain, 2022.
- Fireact: Toward language agent fine-tuning, 2023.
- Language model cascades, 2022.
- Raft: Reward ranked finetuning for generative foundation model alignment, 2023.
- ELI5: long form question answering. CoRR, abs/1907.09190, 2019. URL http://arxiv.org/abs/1907.09190.
- Pal: Program-aided language models, 2023.
- Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998, 2023.
- Large language models cannot self-correct reasoning yet, 2023.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp, 2023a.
- Dspy: Compiling declarative language model calls into self-improving pipelines, 2023b.
- Hurdles to progress in long-form question answering, 2021.
- Let’s verify step by step, 2023.
- Jerry Liu. Llamaindex. https://github.com/jerryjliu/llama_index, 2022.
- Self-refine: Iterative refinement with self-feedback, 2023.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
- Measuring and narrowing the compositionality gap in language models, 2023.
- Iterated decomposition: Improving science q&a by supervising reasoning processes, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Beyond human data: Scaling self-training for problem-solving with language models, 2023.
- Solving math word problems with process- and outcome-based feedback, 2022.
- The rise and potential of large language model based agents: A survey, 2023.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. CoRR, abs/1809.09600, 2018. URL http://arxiv.org/abs/1809.09600.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Star: Bootstrapping reasoning with reasoning, 2022.
- Renat Aksitov (7 papers)
- Sobhan Miryoosefi (9 papers)
- Zonglin Li (27 papers)
- Daliang Li (28 papers)
- Sheila Babayan (4 papers)
- Kavya Kopparapu (9 papers)
- Zachary Fisher (13 papers)
- Ruiqi Guo (18 papers)
- Sushant Prakash (15 papers)
- Pranesh Srinivasan (4 papers)
- Manzil Zaheer (89 papers)
- Felix Yu (62 papers)
- Sanjiv Kumar (123 papers)