Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ExpeL: LLM Agents Are Experiential Learners (2308.10144v3)

Published 20 Aug 2023 in cs.LG, cs.AI, and cs.CL

Abstract: The recent surge in research interest in applying LLMs to decision-making tasks has flourished by leveraging the extensive world knowledge embedded in LLMs. While there is a growing demand to tailor LLMs for custom decision-making tasks, finetuning them for specific tasks is resource-intensive and may diminish the model's generalization capabilities. Moreover, state-of-the-art LLMs like GPT-4 and Claude are primarily accessible through API calls, with their parametric weights remaining proprietary and unavailable to the public. This scenario emphasizes the growing need for new methodologies that allow learning from agent experiences without requiring parametric updates. To address these problems, we introduce the Experiential Learning (ExpeL) agent. Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results highlight the robust learning efficacy of the ExpeL agent, indicating a consistent enhancement in its performance as it accumulates experiences. We further explore the emerging capabilities and transfer learning potential of the ExpeL agent through qualitative observations and additional experiments.

ExpeL: LLM Agents Are Experiential Learners

The paper "ExpeL: LLM Agents Are Experiential Learners" presents a compelling approach to enhancing the capabilities of LLM agents through experiential learning. This framework, known as ExpeL, is crafted to autonomously gather experiences and extract insights in natural language from diverse tasks. The ExpeL agent utilizes these insights and recalls past successful examples during inference, thereby improving decision-making without parameter updates. This method is particularly beneficial given the proprietary nature of cutting-edge LLMs like GPT-4 and Claude, where access to parametric weights might not be possible.

Core Concept and Methodology

The ExpeL framework centers around two primary modes of learning: extracting insights from experience and recalling similar successful experiences as demonstrations. During the training phase, experiences are gathered through multiple trials, enabled by Reflexion—a framework that allows reflective learning based on past failures. This phase involves collecting both successful and failed trajectories across tasks. In the inference phase, the agent recalls these experiences by employing task similarity-based retrieval, providing specific examples as context for decision-making in new tasks.

Key Results

The empirical evaluation across various domains—HotpotQA, ALFWorld, and WebShop—demonstrated ExpeL's efficacy in consistently outperforming baseline models (ReAct and Act). Specifically, the ExpeL agent achieved a 39% success rate in HotpotQA tasks, a notable improvement over ReAct's 28%, which highlights the significant impact of insight extraction on reasoning tasks. In ALFWorld, where task completion relies on specific actions, the retrieval of successful trajectories from similar tasks showed marked improvements. These results underscore the synergistic effect of insight extraction and successful trajectory retrieval in enhancing performance across diverse environments.

Transfer Learning Potential

The paper also explores the transfer learning potential of ExpeL by applying insights gained from HotpotQA to the FEVER dataset. The agent successfully transferred knowledge, achieving a 70% success rate in FEVER tasks, surpassing other baselines. This indicates that ExpeL's experiential learning approach can be beneficial in scenarios where task distributions share common knowledge elements, even when direct retrieval of experiences from one domain might not be feasible.

Implications and Future Directions

The practical implications of ExpeL are significant in scenarios requiring adaptable and efficient decision-making processes. By facilitating cross-task learning and enabling agents to autonomously leverage their experiences, this approach enhances LLM agents without extensive data labeling or computational resources. The findings suggest potential applications in areas such as autonomous systems and interactive agents, where adaptability and incremental learning from diverse inputs are crucial.

Theoretical implications include offering a framework for integrating human-like experiential learning processes within LLM agents, potentially paving the way for more cognitively inspired AI systems. As foundation models and retrieval mechanisms continue to advance, ExpeL stands to benefit from these improvements, suggesting a naturally evolving enhancement pathway for LLM agents.

Future developments could examine the integration of vision and language modalities, refining how insights are dynamically retrieved and applied, or exploring further theoretical underpinnings to establish more optimal agent behaviors. Additionally, the exploration of public-domain models, alongside approaches for effective utilization of proprietary LLMs, may broaden the applicability of ExpeL across diverse domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Anthropic. 2023. Introducing Claude.
  2. Emergent Autonomous Scientific Research Capabilities of Large Language Models. arXiv preprint.
  3. ChemCrow: Augmenting Large-Language Models with Chemistry Tools. arXiv preprint.
  4. Language Models are Few-Shot Learners. NeurIPS.
  5. Chase, H. 2023. Langchain.
  6. PaLM: Scaling Language Modeling with Pathways. JMLR.
  7. Scaling Instruction-Finetuned Language Models. arXiv preprint.
  8. Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey. arXiv preprint.
  9. MindAgent: Emergent Gaming Interaction. arXiv preprint.
  10. A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis. arXiv preprint.
  11. Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition. In CoRL. PMLR.
  12. Reasoning with Language Model is Planning with World Model. arXiv preprint.
  13. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In ICML. PMLR.
  14. Large-scale Retrieval for Reinforcement Learning. NeurIPS.
  15. Billion-scale Similarity Search with GPUs. IEEE Transactions on Big Data.
  16. Kahneman, D. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux.
  17. Large Language Models are Zero-Shot Reasoners. NeurIPS.
  18. A Survey on Retrieval-Augmented Text Generation. arXiv preprint.
  19. SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. NeurIPS.
  20. Text2Motion: From Natural Language Instructions to Feasible Plans. Autonomous Robots.
  21. Lin, L.-J. 1992. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching. Machine learning.
  22. What Makes Good In-Context Examples for GPT-3? In DeeLIO. Association for Computational Linguistics.
  23. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys.
  24. AgentBench: Evaluating LLMs as Agents. arXiv preprint.
  25. REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction. In CoRL. PMLR.
  26. Maas; Carey; Wheeler; Saatchi; Billington; and Shamash. 2023. To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations. arXiv preprint.
  27. Large Language Models as General Pattern Machines. In CoRL. PMLR.
  28. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. NeurIPS.
  29. Nakajima, Y. 2023. BabyAGI. https://github.com/yoheinakajima/babyagi.
  30. WebGPT: Browser-Assisted Question-Answering with Human Feedback. arXiv preprint.
  31. OpenAI. 2023. GPT-4 Technical Report.
  32. Training Language Models to Follow Instructions with Human Feedback. In NeurIPS.
  33. Generative Agents: Interactive Simulacra of Human Behavior. In ACM Symposium on User Interface Software and Technology.
  34. Language Models as Knowledge Bases? In EMNLP-IJCNLP. Association for Computational Linguistics.
  35. Communicative Agents for Software Development. arXiv:2307.07924.
  36. Learning To Retrieve Prompts for In-Context Learning. In NAACL. Association for Computational Linguistics.
  37. Prioritized Experience Replay. In ICLR.
  38. From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. NeurIPS.
  39. Reflexion: Language Agents with Verbal Reinforcement Learning. In NeurIPS.
  40. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In ICLR.
  41. Significant-Gravitas. 2023. AutoGPT. https://github.com/Significant-Gravitas/Auto-GPT.
  42. MPNet: Masked and Permuted Pre-training for Language Understanding. NeurIPS.
  43. Cognitive Architectures for Language Agents. arXiv preprint.
  44. AdaPlanner: Adaptive Planning from Feedback with Language Models. NeurIPS.
  45. Reinforcement Learning: An Introduction. MIT press.
  46. Stanford Alpaca: An Instruction-Following LLaMA Model. https://github.com/tatsu-lab/stanford˙alpaca.
  47. LaMDA: Language Models for Dialog Applications. arXiv preprint.
  48. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In NAACL.
  49. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint.
  50. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint.
  51. Focused Transformer: Contrastive Training for Context Scaling. In NeurIPS.
  52. Voyager: An Open-ended Embodied Agent with Large Language Models. arXiv preprint.
  53. A Survey on Large Language Model Based Autonomous Agents. arXiv preprint.
  54. Learning to Retrieve In-Context Examples for Large Language Models. arXiv preprint.
  55. Avalon’s Game of Thoughts: Battle Against Deception through Recursive Contemplation. arXiv preprint.
  56. Q-learning. Machine learning.
  57. Finetuned Language Models are Zero-Shot Learners. In ICLR.
  58. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS.
  59. TidyBot: Personalized Robot Assistance with Large Language Models. Autonomous Robots.
  60. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv preprint.
  61. Foundation Models for Decision Making: Problems, Methods, and Opportunities. arXiv preprint.
  62. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. arXiv preprint.
  63. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP. Association for Computational Linguistics.
  64. WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. In NeurIPS.
  65. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS.
  66. ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR.
  67. Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.
  68. Offline Prioritized Experience Replay. arXiv preprint.
  69. AgentTuning: Enabling Generalized Agent Abilities for LLMs. arXiv preprint.
  70. Automatic Chain of Thought Prompting in Large Language Models. In ICLR.
  71. Augmenting Unsupervised Reinforcement Learning with Self-Reference. arXiv preprint.
  72. A Survey of Large Language Models. arXiv preprint.
  73. A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE.
  74. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. In CoRL. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Andrew Zhao (28 papers)
  2. Daniel Huang (11 papers)
  3. Quentin Xu (3 papers)
  4. Matthieu Lin (15 papers)
  5. Yong-Jin Liu (66 papers)
  6. Gao Huang (178 papers)
Citations (129)