Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gorilla: Large Language Model Connected with Massive APIs (2305.15334v1)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: LLMs have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu

An Overview of Gorilla: LLM Connected with Massive APIs

The paper "Gorilla: LLM Connected with Massive APIs" introduces a novel approach to enhance the utility of LLMs by connecting them with massive application programming interfaces (APIs). Despite the notable advancements in LLMs, their capability to effectively utilize external tools via API calls has remained significantly limited. This paper addresses the challenge of accurate tool usage via APIs, which even state-of-the-art models like GPT-4 struggle with, particularly due to issues of providing accurate input arguments and avoiding hallucinations.

Key Contributions and Results

The authors propose Gorilla, a fine-tuned model based on LLaMA. This model demonstrates superior performance in generating API calls when compared to GPT-4, especially in terms of API functionality accuracy and reducing hallucination errors. The introduction of Gorilla is bolstered by the development of APIBench, a comprehensive dataset containing APIs from HuggingFace, TorchHub, and TensorHub. APIBench provides the groundwork for evaluating LLMs' ability to generate correct API calls. Gorilla’s performance is rigorously tested against leading LLMs using standard benchmarks.

The employment of self-instruct fine-tuning and retrieval methods enables Gorilla to adapt effectively to changes in API documentation at test time. This adaptability is a noteworthy advancement as API documentation is frequently updated, a challenge that static models face in staying relevant and accurate in real time. The integration of document retrieval assists Gorilla in mitigating hallucination issues and leveraging updated documentation dynamically.

Implications for Future AI Developments

This research highlights several implications for the field of artificial intelligence, particularly in enhancing the interaction between LLMs and dynamically evolving information sources such as APIs. The ability of models like Gorilla to remain updated with changes in documentation without requiring exhaustive retraining is a promising step towards more autonomous and reliable AI systems. This characteristic is particularly useful in domains where continuous updates and adaptations are necessary, providing a more seamless and accurate user experience.

Additionally, the development of systematic benchmarks such as APIBench, which test models on large and dynamic sets of APIs, sets a foundation for future research into tools and methodologies for evaluating and improving LLMs' performance in practical applications. This emphasis on robustness and adaptability suggests a direction where LLMs could become integral interfaces for various computational infrastructures and services, extending their utility far beyond mere language processing.

Challenges and Considerations

While Gorilla represents a significant advancement in API call generation, the paper acknowledges the complexities introduced by multi-faceted constraints inherent in real-world applications, such as parameter limitations and accuracy specifications. Handling these constraints requires sophisticated reasoning capabilities from the models, ensuring that the selected API calls adhere to specified requirements.

The paper also explores the importance of fine-tuning LLMs with contextual information from retrieval systems. However, it cautions that integrating retrieval can sometimes mislead the model, stressing the need for high-quality retrievers to ensure performance improvements. This intriguing aspect opens up avenues for further exploring how retrieval methods can be optimized alongside fine-tuning processes.

Conclusion

The introduction of Gorilla marks an important step in enhancing the practical applicability of LLMs through better integration with external systems via APIs. By significantly reducing hallucinations and improving accuracy in API call generation, Gorilla sets the stage for more reliable and adaptable AI applications capable of maintaining relevance amidst shifting landscapes. As AI continues to evolve, models like Gorilla illustrate the ways in which LLMs can overcome current limitations and extend their utility across various interactive domains, thus paving the way for future innovations in AI-human interaction and autonomous systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shishir G. Patil (8 papers)
  2. Tianjun Zhang (38 papers)
  3. Xin Wang (1307 papers)
  4. Joseph E. Gonzalez (167 papers)
Citations (391)
Youtube Logo Streamline Icon: https://streamlinehq.com