Gorilla: Large Language Model Connected with Massive APIs (2305.15334v1)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: LLMs have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu

Citations (391)

View on Semantic Scholar

Summary

The paper introduces Gorilla, a fine-tuned LLM based on LLaMA that significantly improves API call accuracy compared to GPT-4 by reducing hallucinations.
It employs self-instruct fine-tuning and dynamic document retrieval to adapt to evolving API documentation at test time, ensuring consistent performance.
The study leverages APIBench, a novel benchmark dataset of APIs from platforms like HuggingFace and TorchHub, to evaluate and optimize tool usage.

An Overview of Gorilla: LLM Connected with Massive APIs

The paper "Gorilla: LLM Connected with Massive APIs" introduces a novel approach to enhance the utility of LLMs by connecting them with massive application programming interfaces (APIs). Despite the notable advancements in LLMs, their capability to effectively utilize external tools via API calls has remained significantly limited. This paper addresses the challenge of accurate tool usage via APIs, which even state-of-the-art models like GPT-4 struggle with, particularly due to issues of providing accurate input arguments and avoiding hallucinations.

Key Contributions and Results

The authors propose Gorilla, a fine-tuned model based on LLaMA. This model demonstrates superior performance in generating API calls when compared to GPT-4, especially in terms of API functionality accuracy and reducing hallucination errors. The introduction of Gorilla is bolstered by the development of APIBench, a comprehensive dataset containing APIs from HuggingFace, TorchHub, and TensorHub. APIBench provides the groundwork for evaluating LLMs' ability to generate correct API calls. Gorilla’s performance is rigorously tested against leading LLMs using standard benchmarks.

The employment of self-instruct fine-tuning and retrieval methods enables Gorilla to adapt effectively to changes in API documentation at test time. This adaptability is a noteworthy advancement as API documentation is frequently updated, a challenge that static models face in staying relevant and accurate in real time. The integration of document retrieval assists Gorilla in mitigating hallucination issues and leveraging updated documentation dynamically.

Implications for Future AI Developments

This research highlights several implications for the field of artificial intelligence, particularly in enhancing the interaction between LLMs and dynamically evolving information sources such as APIs. The ability of models like Gorilla to remain updated with changes in documentation without requiring exhaustive retraining is a promising step towards more autonomous and reliable AI systems. This characteristic is particularly useful in domains where continuous updates and adaptations are necessary, providing a more seamless and accurate user experience.

Additionally, the development of systematic benchmarks such as APIBench, which test models on large and dynamic sets of APIs, sets a foundation for future research into tools and methodologies for evaluating and improving LLMs' performance in practical applications. This emphasis on robustness and adaptability suggests a direction where LLMs could become integral interfaces for various computational infrastructures and services, extending their utility far beyond mere language processing.

Challenges and Considerations

While Gorilla represents a significant advancement in API call generation, the paper acknowledges the complexities introduced by multi-faceted constraints inherent in real-world applications, such as parameter limitations and accuracy specifications. Handling these constraints requires sophisticated reasoning capabilities from the models, ensuring that the selected API calls adhere to specified requirements.

The paper also explores the importance of fine-tuning LLMs with contextual information from retrieval systems. However, it cautions that integrating retrieval can sometimes mislead the model, stressing the need for high-quality retrievers to ensure performance improvements. This intriguing aspect opens up avenues for further exploring how retrieval methods can be optimized alongside fine-tuning processes.

Conclusion

The introduction of Gorilla marks an important step in enhancing the practical applicability of LLMs through better integration with external systems via APIs. By significantly reducing hallucinations and improving accuracy in API call generation, Gorilla sets the stage for more reliable and adaptable AI applications capable of maintaining relevance amidst shifting landscapes. As AI continues to evolve, models like Gorilla illustrate the ways in which LLMs can overcome current limitations and extend their utility across various interactive domains, thus paving the way for future innovations in AI-human interaction and autonomous systems.

Related Papers

Tweets

https://twitter.com/AI4Startups/status/1798316158384476304

https://twitter.com/lvanseters/status/1781100495311864047

https://twitter.com/cwolferesearch/status/1762945185023996286

YouTube

Show All Videos