Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning (2402.15506v4)

Published 23 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Autonomous agents powered by LLMs have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \textit{AgentOhana} aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training. Additionally, we present \textbf{xLAM-v0.1}, a large action model tailored for AI agents, which demonstrates exceptional performance across various benchmarks. Begin the exploration at \url{https://github.com/SalesforceAIResearch/xLAM}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Harrison Chase. Langchain. https://github.com/hwchase17/langchain, 2023.
  2. Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701, 2023.
  3. Mind2web: Towards a generalist agent for the web. arXiv preprint arXiv:2306.06070, 2023.
  4. Qlora: Efficient finetuning of quantized llms, 2023.
  5. Alpacafarm: A simulation framework for methods that learn from human feedback, 2023.
  6. Significant Gravitas. Autogpt. https://github.com/Significant-Gravitas/Auto-GPT, 2023.
  7. Direct language model alignment from online ai feedback. arXiv preprint arXiv:2402.04792, 2024.
  8. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  9. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  10. Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023.
  11. Agentbench: Evaluating llms as agents, 2023a.
  12. Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents. arXiv preprint arXiv:2308.05960, 2023b.
  13. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
  14. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
  15. Codegen2: Lessons for training llms on programming and natural languages. ICLR, 2023.
  16. OpenAI. Gpt-4 technical report. ArXiv, 2023.
  17. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
  18. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
  19. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  20. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  21. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2021. URL https://arxiv.org/abs/2010.03768.
  22. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
  23. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  24. XAgent Team. Xagent: An autonomous agent for complex task solving, 2023.
  25. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  26. Mint: Evaluating llms in multi-turn interaction with tools and language feedback. arXiv preprint arXiv:2309.10691, 2023a.
  27. Drdt: Dynamic reflection with divergent thinking for llm-based sequential recommendation. arXiv preprint arXiv:2312.11336, 2023b.
  28. Openagents: An open platform for language agents in the wild. arXiv preprint arXiv:2310.10634, 2023.
  29. Lemur: Harmonizing natural language and code for language agents. arXiv preprint arXiv:2310.06830, 2023.
  30. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
  31. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35:20744–20757, 2022.
  32. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023.
  33. Lumos: Learning agents with unified data, modular design, and open-source llms. arXiv preprint arXiv:2311.05657, 2023.
  34. Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
  35. Agenttuning: Enabling generalized agent abilities for llms. arXiv preprint arXiv:2310.12823, 2023.
  36. Dialogstudio: Towards richest and most diverse unified dataset collection for conversational ai. arXiv preprint arXiv:2307.10172, 2023.
  37. Lmsys-chat-1m: A large-scale real-world llm conversation dataset. arXiv preprint arXiv:2309.11998, 2023a.
  38. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (18)
  1. Jianguo Zhang (97 papers)
  2. Tian Lan (162 papers)
  3. Rithesh Murthy (12 papers)
  4. Zhiwei Liu (114 papers)
  5. Weiran Yao (31 papers)
  6. Juntao Tan (33 papers)
  7. Thai Hoang (9 papers)
  8. Liangwei Yang (46 papers)
  9. Yihao Feng (35 papers)
  10. Zuxin Liu (43 papers)
  11. Tulika Awalgaonkar (6 papers)
  12. Juan Carlos Niebles (95 papers)
  13. Silvio Savarese (200 papers)
  14. Shelby Heinecke (37 papers)
  15. Huan Wang (211 papers)
  16. Caiming Xiong (337 papers)
  17. Ming Zhu (117 papers)
  18. Shirley Kokane (9 papers)
Citations (18)

Summary

The Agent Collection: Designing Unified Data and Training Pipeline for Effective Agent Learning

The exploration of LLM-augmented Autonomous Agents (LAAs) presents an intriguing frontier in AI research, where the inherent language processing capabilities of LLMs are harnessed to enhance autonomous agents' performance in complex task environments. This paper offers a systematic evaluation of different agent architectures and the performance implications of integrating various LLM backbones, aiming to provide comprehensive insights into optimizing LAAs.

Overview of LLM-Augmented Autonomous Agents

LAAs represent a nascent domain where autonomous agents make decisions and interact with environments leveraging LLMs' capacity to process and generate language. These agents benefit from past interactions, synthesizing observations and actions to address complex sequences in decision-making tasks. The authors provide a thorough comparative analysis of different LAA architectures, particularly focusing on how different LLMs facilitate agent interaction efficacy. Additionally, they propose a novel multi-agent orchestration model, BOLAA, which distributes task responsibilities among specialized agents to enhance performance.

Agent Architectures

Several LAA architectures are systematically analyzed, each tailored to different task requirements:

  1. Zero Shot (ZS) and Zero Shot Think (ZST) LAA: ZS-LAA initiates action generation with zero-shot prompting, while ZST-LAA enhances this with intermediate reasoning steps.
  2. ReAct LAA: Employs few-shot prompting to contextualize action generation, improving interaction efficacy.
  3. PlanAct and PlanReAct LAA: These incorporate planning steps before action execution, with PlanReAct integrating reasoning before action generation.
  4. BOLAA: Distinguishes itself by orchestrating multiple LAAs, each focusing on specific action types, coordinated by a central controller managing task allocation and inter-agent communication.

Experimental Results

The paper reports experiments conducted in two complex environments: WebShop for decision-making and HotPotQA for knowledge reasoning. Performance is assessed via reward metrics and recall rates, providing quantitative insights into the suitability of certain LAA architectures when paired with various LLMs.

Decision-Making Environment

In WebShop, BOLAA consistently outperformed other architectures, demonstrating significant improvements in reward scores. The strategy of distributing task responsibilities among specialist agents appears instrumental in this superior performance. Selecting optimal LLMs and leveraging BOLAA's architecture enhances performance, particularly in complex task scenarios. For instance, OpenAI's GPT models showed enhanced action generation capabilities under simpler ZS architectures, while planning flow benefited models like LongChat’s 13b variant significantly.

Knowledge Reasoning Environment

In the HotPotQA setting, ReAct LAA exhibited superior performance, indicating the necessity of few-shot examples in augmenting LLMs for complex reasoning tasks. Planning flows, typically advantageous in decision-making environments, can introduce detrimental effects on reasoning tasks due to pre-established plans not adapting well to emergent contexts.

Implications and Future Directions

The findings of this research provide valuable guidance on designing and deploying LAAs effectively. The results stress the importance of aligning agent architectures with suitable LLM models, identifying context length and model size as influential factors. The introduction of specialized agents, as demonstrated with BOLAA, offers a viable path forward in managing complex tasks efficiently.

Looking ahead, the field can anticipate further advancements in LAA capabilities through fine-tuning specialized agents and developing comprehensive benchmarks across differing task settings. As AI strategies become more sophisticated, orchestrating multiple agents with autonomous controllers, potentially imbued with reinforcement learning capabilities, represents a fertile area for exploration and enhancement.

By systematically evaluating complex AI systems across various architectures and environments, this paper provides a foundational approach to structuring and optimizing LLM-driven autonomous agents, fostering advances in both theoretical understanding and practical application.