PathFinder: Guided Search over Multi-Step Reasoning Paths (2312.05180v2)
Abstract: With recent advancements in LLMs, methods like chain-of-thought prompting to elicit reasoning chains have been shown to improve results on reasoning tasks. However, tasks that require multiple steps of reasoning still pose significant challenges to state-of-the-art models. Drawing inspiration from the beam search algorithm, we propose PathFinder, a tree-search-based reasoning path generation approach. It enhances diverse branching and multi-hop reasoning through the integration of dynamic decoding, enabled by varying sampling methods and parameters. Using constrained reasoning, PathFinder integrates novel quality constraints, pruning, and exploration methods to enhance the efficiency and the quality of generation. Moreover, it includes scoring and ranking features to improve candidate selection. Our approach outperforms competitive baselines on three complex arithmetic and commonsense reasoning tasks by 6% on average. Our model generalizes well to longer, unseen reasoning chains, reflecting similar complexities to beam search with large branching factors.
- Guided open vocabulary image captioning with constrained beam search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, Copenhagen, Denmark. Association for Computational Linguistics.
- On the dangers of stochastic parrots: Can language models be too big? FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Transformers as soft reasoners over language. IJCAI.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance. arXiv preprint arXiv:2305.17306.
- Pal: Program-aided language models.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Roscoe: A suite of metrics for scoring step-by-step reasoning. arXiv preprint arXiv:2212.07919.
- Alex Graves. 2012. Sequence transduction with recurrent neural networks.
- Reasoning with language model is planning with world model.
- Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, Vancouver, Canada. Association for Computational Linguistics.
- The curious case of neural text degeneration. ICLR.
- HuggingFace. sentence-transformers/all-mpnet-base-v2.
- Comparison of diverse decoding methods from conditional language models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3752–3762.
- Dan Jurafsky and James H. Martin. 2009. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, Upper Saddle River, N.J.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, pages 28–39, Vancouver. Association for Computational Linguistics.
- Less annotating, more classifying–addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli.
- Solving quantitative reasoning problems with language models. In Advances in Neural Information Processing Systems.
- Making large language models better reasoners with step-aware verifier.
- NeuroLogic decoding: (un)supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4288–4299, Online. Association for Computational Linguistics.
- Kris McGuffie and Alex Newhouse. 2020. The radicalization risks of GPT-3 and advanced neural language models. CoRR.
- Locally typical sampling. ACL.
- Cgmh: Constrained sentence generation by metropolis-hastings sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6834–6842.
- Show your work: Scratchpads for intermediate computation with language models. CoRR, abs/2112.00114.
- Receval: Evaluating reasoning chains via correctness and informativeness. arXiv preprint arXiv:2304.10703.
- Toolformer: Language models can teach themselves to use tools.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
- The art of llm refinement: Ask, refine, and trust.
- Follow the wisdom of the crowd: Effective text generation via minimum bayes risk decoding. arXiv preprint arXiv:2211.07634.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
- Commonsenseqa 2.0: Exposing the limits of AI through gamification. NeurIPS.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Self-consistency improves chain of thought reasoning in language models.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903.
- Non-monotonic sequential text generation. In International Conference on Machine Learning, pages 6716–6726. PMLR.
- Decomposition enhances reasoning via self-evaluation guided decoding.
- Breaking the beam search curse: A study of (re-)scoring methods and stopping criteria for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3054–3059, Brussels, Belgium. Association for Computational Linguistics.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- Least-to-most prompting enables complex reasoning in large language models.