RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation (2403.05313v1)
Abstract: We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves LLMs' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- retrieval-augmented thoughts (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
- Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136, 2023.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. arXiv preprint arXiv:2206.11795, 2022.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13734–13744, 2023a.
- Groot: Learning to follow instructions by watching gameplay videos. arXiv preprint arXiv:2310.08235, 2023b.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- A. Creswell and M. Shanahan. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271, 2022.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712, 2022.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv: 2309.11495, 2023.
- Retrieval-generation synergy augmented large language models. ArXiv, abs/2310.05149, 2023a.
- Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149, 2023b.
- Pal: Program-aided language models. arXiv preprint arXiv:2211.10435, 2022.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
- Search engine guided neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Deepseek-coder: When the large language model meets programming – the rise of code intelligence. arXiv preprint arXiv:2401.14196, 2024.
- Trueskill™: a bayesian skill rating system. Advances in neural information processing systems, 19, 2006.
- The Oxford handbook of thinking and reasoning. Oxford University Press, 2012.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. ICML, 2022.
- Continual training of language models for few-shot learning. arXiv preprint arXiv:2210.05549, 2022a.
- Adapting a language model while preserving its general knowledge. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10177–10188, 2022b.
- Continual pre-training of language models. In The Eleventh International Conference on Learning Representations, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020a.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020b.
- Chain of code: Reasoning with a language model-augmented code emulator, 2023a.
- Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources. In The Twelfth International Conference on Learning Representations, 2023b.
- Steve-1: A generative model for text-to-behavior in minecraft. arXiv preprint arXiv:2306.00937, 2023.
- Mcu: A task-centric framework for open-ended agent evaluation in minecraft. arXiv preprint arXiv:2310.08367, 2023.
- Selecting large language model to fine-tune via rectified scaling law. arXiv preprint arXiv:2402.02314, 2024.
- Gradually excavating external knowledge for implicit complex question answering. In Conference on Empirical Methods in Natural Language Processing, 2023a.
- Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Reacc: A retrieval-augmented code completion framework. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6227–6240, 2022.
- Retrieval-based prompt selection for code-related few-shot learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE’23), 2023.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
- A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922, 2023.
- N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Entailment tree explanations via iterative retrieval-generation reasoner. arXiv preprint arXiv:2205.09224, 2022.
- Code llama: Open foundation models for code. ArXiv, abs/2308.12950, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023.
- G. P. Team. Palm: Scaling language modeling with pathways. arXiv preprint arXiv: 2204.02311, 2022.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. ArXiv, abs/2212.10509, 2022a.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509, 2022b.
- X. Wang and D. Zhou. Chain-of-thought reasoning without prompting. arXiv preprint arXiv:2402.10200, 2024.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023a.
- Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. ArXiv, abs/2311.05997, 2023b.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023c.
- Chain of thought prompting elicits reasoning in large language models. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022.
- Grove: a retrieval-augmented complex story generation framework with a forest of evidence. arXiv preprint arXiv:2310.05388, 2023.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models, 2023.
- Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
- Pre-training goal-based models for sample-efficient reinforcement learning. In The Twelfth International Conference on Learning Representations, 2024.
- Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488, 2022.
- Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023.
- Retrieving multimodal information for augmented generation: A survey. ArXiv, abs/2303.10868, 2023.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022a.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023.
- Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations, 2022b.