THOUGHTSCULPT: Reasoning with Intermediate Revision and Search (2404.05966v2)
Abstract: We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action space includes revision actions: THOUGHTSCULPT may choose to revise part of its previous output rather than continuing to build the rest of its output. Empirically, THOUGHTSCULPT outperforms state-of-the-art reasoning methods across three challenging tasks: Story Outline Improvement (up to +30% interestingness), Mini-Crosswords Solving (up to +16% word success rate), and Constrained Generation (up to +10% concept coverage).
- Anthropic, 2024. URL https://www.anthropic.com/news/claude-3-family.
- Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
- Graph of thoughts: Solving elaborate problems with large language models, 2024.
- Language models are few-shot learners, 2020.
- Teaching large language models to self-debug, 2023.
- Scaling instruction-finetuned language models, 2022.
- Nl-edit: Correcting semantic parse errors through natural language interaction, 2021.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022a.
- Inner monologue: Embodied reasoning through planning with language models, 2022b.
- Language models can solve computer tasks, 2023.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning, 2022.
- CommonGen: A constrained text generation challenge for generative commonsense reasoning. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1823–1840, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.165. URL https://aclanthology.org/2020.findings-emnlp.165.
- Decomposing complex queries for tip-of-the-tongue retrieval, 2023.
- Rainier: Reinforced knowledge introspector for commonsense question answering, 2022.
- Quark: Controllable text generation with reinforced unlearning, 2022.
- Self-Refine: Iterative Refinement with Self-Feedback, May 2023. URL http://arxiv.org/abs/2303.17651. arXiv:2303.17651 [cs].
- Co-writing screenplays and theatre scripts with language models: An evaluation by industry professionals, 2022.
- OpenAI. Gpt-4 technical report, 2024.
- REFINER: Reasoning Feedback on Intermediate Representations, February 2024. URL http://arxiv.org/abs/2304.01904. arXiv:2304.01904 [cs].
- Reflexion: Language Agents with Verbal Reinforcement Learning, October 2023. URL http://arxiv.org/abs/2303.11366. arXiv:2303.11366 [cs].
- Learning to repair: Repairing model output errors after deployment using a dynamic memory of feedback, 2022.
- Zero-shot sonnet generation with discourse-level planning and aesthetics features. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3587–3597, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.262. URL https://aclanthology.org/2022.naacl-main.262.
- Llama: Open and efficient foundation language models, 2023a.
- Llama 2: Open foundation and fine-tuned chat models, 2023b.
- Self-consistency improves chain of thought reasoning in language models, 2023a.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023b.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Generating sequences by learning to self-correct, 2022.
- Self-evaluation guided beam search for reasoning, 2023.
- Re3: Generating Longer Stories With Recursive Reprompting and Revision, October 2022. URL http://arxiv.org/abs/2210.06774. arXiv:2210.06774 [cs].
- DOC: Improving Long Story Coherence With Detailed Outline Control, June 2023. URL http://arxiv.org/abs/2212.10077. arXiv:2212.10077 [cs].
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models, May 2023a. URL http://arxiv.org/abs/2305.10601. arXiv:2305.10601 [cs].
- React: Synergizing reasoning and acting in language models, 2023b.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.