TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision (2403.06221v1)
Abstract: Numerous LLM agents have been built for different tasks like web navigation and online shopping due to LLM's wide knowledge and text-understanding ability. Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples. Recently, methods based on trajectory-level retrieval with task meta-data and using trajectories as in-context examples have been proposed to improve the agent's overall performance in some sequential decision making tasks. However, these methods can be problematic due to plausible examples retrieved without task-specific state transition dynamics and long input with plenty of irrelevant context. In this paper, we propose a novel framework (TRAD) to address these issues. TRAD first conducts Thought Retrieval, achieving step-level demonstration selection via thought matching, leading to more helpful demonstrations and less irrelevant input noise. Then, TRAD introduces Aligned Decision, complementing retrieved demonstration steps with their previous or subsequent steps, which enables tolerance for imperfect thought and provides a choice for balance between more context and less noise. Extensive experiments on ALFWorld and Mind2Web benchmarks show that TRAD not only outperforms state-of-the-art models but also effectively helps in reducing noise and promoting generalization. Furthermore, TRAD has been deployed in real-world scenarios of a global business insurance company and improves the success rate of robotic process automation.
- Pddl— the planning domain definition language. Technical Report (1998).
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv preprint arXiv:2308.09687 (2023).
- Language Models are Few-Shot Learners. In Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS).
- Mind2Web: Towards a Generalist Agent for the Web. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Everything of thoughts: Defying the law of penrose triangle for thought generation. arXiv preprint arXiv:2311.04254 (2023).
- ExaRanker: Synthetic Explanations Improve Neural Rankers. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2409––2414.
- A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis. In Proceedings of The 12th International Conference on Learning Representations (ICLR).
- Understanding HTML with Large Language Models. In Findings of the Association for Computational Linguistics (EMNLP). 2803–2821.
- Reasoning with Language Model is Planning with World Model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8154–8173.
- The Curious Case of Neural Text Degeneration. In Proceedings of the 8th International Conference on Learning Representations (ICLR).
- Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769–6781.
- Language Models can Solve Computer Tasks. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).
- Code as Policies: Language Model Programs for Embodied Control. In Proceedings of 2023 IEEE International Conference on Robotics and Automation (ICRA). 9493–9500.
- LLM+P: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 (2023).
- What Makes Good In-Context Examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021).
- Generative Relevance Feedback with Large Language Models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2026–2031.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).
- OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
- Training language models to follow instructions with human feedback. In Proceedings of the 36th Advances in Neural Information Processing Systems (NeurIPS). 27730–27744.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). 1–22.
- Language models are unsupervised multitask learners. OpenAI Blog (2019).
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3980–3990.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
- Learning To Retrieve Prompts for In-Context Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2655–2671.
- Toolformer: Language models can teach themselves to use tools. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).
- World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning (ICML), Vol. 70. 3135–3144.
- Reflexion: Language agents with verbal reinforcement learning. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).
- ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10737–10746.
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of 9th International Conference on Learning Representations (ICLR).
- The LongChat Team. 2023. How Long Can Open-Source LLMs Truly Promise on Context Length? https://lmsys.org/blog/2023-06-29-longchat/
- LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023).
- Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 10014–10037.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023).
- Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The 11th International Conference on Learning Representations, (ICLR).
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th Advances in Neural Information Processing Systems (NeurIPS).
- Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 1423–1436.
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. In Proceedings of 36th Conference on Neural Information Processing Systems (NeurIPS).
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In Proceedings of 37th Conference on Neural Information Processing Systems (NeurIPS).
- ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of The 11th International Conference on Learning Representations (ICLR).
- Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 174–184.
- Active Example Selection for In-Context Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9134–9148.
- Step-Back Prompting Enables Reasoning Via Abstraction in Large Language Models. In Proceedings of The 12th International Conference on Learning Representations (ICLR).
- Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. In Proceedings of 12th International Conference on Learning Representations (ICLR).
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In The 11th International Conference on Learning Representations (ICLR).
- Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107 (2023).