RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
Abstract: We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves LLMs' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- retrieval-augmented thoughts (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
- Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136, 2023.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. arXiv preprint arXiv:2206.11795, 2022.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13734–13744, 2023a.
- Groot: Learning to follow instructions by watching gameplay videos. arXiv preprint arXiv:2310.08235, 2023b.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- A. Creswell and M. Shanahan. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271, 2022.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712, 2022.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv: 2309.11495, 2023.
- Retrieval-generation synergy augmented large language models. ArXiv, abs/2310.05149, 2023a.
- Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149, 2023b.
- Pal: Program-aided language models. arXiv preprint arXiv:2211.10435, 2022.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
- Search engine guided neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Deepseek-coder: When the large language model meets programming – the rise of code intelligence. arXiv preprint arXiv:2401.14196, 2024.
- Trueskillâ„¢: a bayesian skill rating system. Advances in neural information processing systems, 19, 2006.
- The Oxford handbook of thinking and reasoning. Oxford University Press, 2012.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. ICML, 2022.
- Continual training of language models for few-shot learning. arXiv preprint arXiv:2210.05549, 2022a.
- Adapting a language model while preserving its general knowledge. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10177–10188, 2022b.
- Continual pre-training of language models. In The Eleventh International Conference on Learning Representations, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020a.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020b.
- Chain of code: Reasoning with a language model-augmented code emulator, 2023a.
- Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources. In The Twelfth International Conference on Learning Representations, 2023b.
- Steve-1: A generative model for text-to-behavior in minecraft. arXiv preprint arXiv:2306.00937, 2023.
- Mcu: A task-centric framework for open-ended agent evaluation in minecraft. arXiv preprint arXiv:2310.08367, 2023.
- Selecting large language model to fine-tune via rectified scaling law. arXiv preprint arXiv:2402.02314, 2024.
- Gradually excavating external knowledge for implicit complex question answering. In Conference on Empirical Methods in Natural Language Processing, 2023a.
- Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Reacc: A retrieval-augmented code completion framework. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6227–6240, 2022.
- Retrieval-based prompt selection for code-related few-shot learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE’23), 2023.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
- A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922, 2023.
- N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Entailment tree explanations via iterative retrieval-generation reasoner. arXiv preprint arXiv:2205.09224, 2022.
- Code llama: Open foundation models for code. ArXiv, abs/2308.12950, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023.
- G. P. Team. Palm: Scaling language modeling with pathways. arXiv preprint arXiv: 2204.02311, 2022.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. ArXiv, abs/2212.10509, 2022a.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509, 2022b.
- X. Wang and D. Zhou. Chain-of-thought reasoning without prompting. arXiv preprint arXiv:2402.10200, 2024.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023a.
- Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. ArXiv, abs/2311.05997, 2023b.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023c.
- Chain of thought prompting elicits reasoning in large language models. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022.
- Grove: a retrieval-augmented complex story generation framework with a forest of evidence. arXiv preprint arXiv:2310.05388, 2023.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models, 2023.
- Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
- Pre-training goal-based models for sample-efficient reinforcement learning. In The Twelfth International Conference on Learning Representations, 2024.
- Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488, 2022.
- Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023.
- Retrieving multimodal information for augmented generation: A survey. ArXiv, abs/2303.10868, 2023.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022a.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023.
- Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations, 2022b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.