xLAM: A Family of Large Action Models to Empower AI Agent Systems (2409.03215v1)

Published 5 Sep 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Autonomous agents powered by LLMs have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce and publicly release xLAM, a series of large action models designed for AI agent tasks. The xLAM series includes five models with both dense and mixture-of-expert architectures, ranging from 1B to 8x22B parameters, trained using a scalable, flexible pipeline that unifies, augments, and synthesizes diverse datasets to enhance AI agents' generalizability and performance across varied environments. Our experimental results demonstrate that xLAM consistently delivers exceptional performance across multiple agent ability benchmarks, notably securing the 1st position on the Berkeley Function-Calling Leaderboard, outperforming GPT-4, Claude-3, and many other models in terms of tool use. By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks. Models are available at https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4

Citations (7)

View on Semantic Scholar

Summary

The paper introduces xLAM, a series of models ranging from 1B to 8x22B parameters designed to empower autonomous AI agents.
The paper details a comprehensive data processing pipeline that unifies, augments, and verifies data to produce high-quality training datasets.
The paper demonstrates xLAM's superior performance in benchmarks, outperforming state-of-the-art models in complex tool usage and web interactions.

Overview of xLAM: A Family of Large Action Models to Empower AI Agent Systems

The paper, authored by Zhang et al. and affiliated with Salesforce AI Research, presents a novel series of models named xLAM, targeted at empowering autonomous AI agents through large action models. The primary motivation behind this work is to address the gaps and challenges faced by the open-source community in developing specialized AI agent models. Specifically, the scarcity of high-quality agent datasets and the absence of standard protocols have been significant barriers. The xLAM series, encompassing models ranging from 1B to 8x22B parameters, offers a promising solution by employing both dense and mixture-of-expert architectures.

Contributions

Model Architectures and Sizes:
- The xLAM series includes five models with architectures ranging from 1B to 8x22B parameters. The diversity in model sizes addresses various computational and deployment needs, making models accessible for on-device deployments and capable of performing complex tasks in larger environments.
Data Processing Pipeline:
- The paper describes a comprehensive data processing pipeline that focuses on data unification, augmentation, quality verification, and synthesis. This pipeline ensures the generation of high-quality datasets by converting diverse datasets into a unified format, enhancing dataset diversity through augmentation, and verifying the data quality through rigorous checks.
Data Synthesis:
- The paper discusses the APIGen framework used to generate verifiable datasets from a large collection of executable APIs. The multi-stage verification process of the synthesized data ensures high quality and accuracy, addressing the common pitfalls of hallucinations and incorrect outputs in LLM-generated content.
Training Methodology:
- The xLAM models were trained using a supervised fine-tuning (SFT) approach, followed by alignment using Direct Preference Optimization (DPO). This methodology, along with a flexible data pipeline, results in robust performance across various agent tasks.

Experimental Benchmarks

The paper evaluates the xLAM models across multiple benchmarks, demonstrating their exceptional performance:

Webshop and ToolQuery:
- xLAM models achieve high success rates in the Webshop and ToolQuery environments. Notably, xLAM-7b-r surpasses competitive models like GPT-4 and Claude2, showcasing superior navigation and execution abilities in web interactions.
ToolBench:
- In multi-turn reasoning and complex tool usage scenarios, xLAM models outperform several state-of-the-art models, including TooLlama-V2 and GPT-3.5-Turbo, highlighting their robustness in both in-domain and out-of-domain tasks.
Berkeley Function-Calling Benchmark:
- xLAM-8x22b-r secures the top position on the BFCL v2 leaderboard, demonstrating the highest overall accuracy. Other xLAM models also rank highly, indicating strong function-calling capabilities and generalization to real-world use cases.

Implications and Future Directions

The introduction of the xLAM series presents significant implications for both practical and theoretical advancements in AI:

Practical Implications:
- The diverse sizes and architectures of the xLAM models make them versatile for various applications, from resource-constrained environments to demanding computational tasks. The open-source release democratizes access to high-performance AI models, potentially accelerating innovation in the development of autonomous agents.
Theoretical Implications:
- The robust data processing and synthesis methodologies highlighted in this paper provide insights into improving the quality and generalizability of LLMs. The emphasis on rigorous data verification and augmentation techniques sets a standard for future research in this area.

Conclusion

The paper by Zhang et al. systematically addresses the challenges faced by the open-source community in developing AI agent models. The xLAM series, through its diverse architectures, scalable training pipeline, and robust performance across benchmarks, represents a significant contribution to the field. This work not only advances the performance of autonomous AI agents but also provides valuable insights and methodologies that can be leveraged in future AI research and applications. By open-sourcing the xLAM models, the authors are paving the way for more equitable and accelerated progress in AI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/iScienceLuvr/status/1831901972238369184

https://twitter.com/huan__wang/status/1835379458427252742

https://twitter.com/LiuZuxin/status/1834618629452603894

https://twitter.com/JianguoZhang3/status/1852054692597239940

https://twitter.com/fly51fly/status/1832175461751030043

https://twitter.com/huan__wang/status/1835371058473660748