Papers
Topics
Authors
Recent
Search
2000 character limit reached

ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations

Published 28 Jan 2025 in cs.LG, cs.AI, cs.CL, and cs.SE | (2501.16945v1)

Abstract: LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible tools. Automatically generating AI-compatible tools from REST API documents can greatly streamline tool agent development and minimize user learning curves. However, API documentation often suffers from a lack of standardization, inconsistent schemas, and incomplete information. To address these issues, we developed \textbf{ToolFactory}, an open-source pipeline for automating tool generation from unstructured API documents. To enhance the reliability of the developed tools, we implemented an evaluation method to diagnose errors. Furthermore, we built a knowledge base of verified tools, which we leveraged to infer missing information from poorly documented APIs. We developed the API Extraction Benchmark, comprising 167 API documents and 744 endpoints in various formats, and designed a JSON schema to annotate them. This annotated dataset was utilized to train and validate ToolFactory. The experimental results highlight the effectiveness of ToolFactory. We also demonstrated ToolFactory by creating a domain-specific AI agent for glycomaterials research. ToolFactory exhibits significant potential for facilitating the seamless integration of scientific REST APIs into AI workflows.

Summary

  • The paper presents ToolFactory, a novel pipeline that transforms unstructured API documentation into structured, AI-usable tools.
  • The methodology integrates APILlama with prompt tuning and a parameter database, achieving a 97% valid JSON generation rate and accurate parameter inference.
  • The case study in glycomaterials research demonstrates the pipeline’s practical impact by enabling seamless integration of scientific APIs into AI workflows.

ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations

Introduction

The paper presents an innovative pipeline, ToolFactory, that automates the generation of AI-usable tools from REST API documentations by leveraging LLMs to process and understand API details written in natural language. This process is critical as APIs often lack standardized schemas, presenting challenges for tool agent development across scientific research domains. ToolFactory addresses these hurdles by providing a comprehensive approach to transform unstructured API documentation into AI-compatible tools, significantly enhancing the efficiency of API integration into AI workflows.

API Extraction Benchmark

The API Extraction Benchmark developed in the study serves as the foundation for ToolFactory. It comprises 167 API documents and 744 endpoints, showcasing a diversity of document structures. This diversity necessitates a general tool generation pipeline capable of processing various document formats (Figure 1). The benchmark facilitates the training and validation of the automation pipeline, focusing on APIs with less structured documentation to ensure a broad applicability of the proposed method. Figure 1

Figure 1: The API Extraction Benchmark includes API documents with varying levels of structures, emphasizing less structured cases to prioritize API variety and the need for a robust tool generation pipeline.

ToolFactory Pipeline

APILlama

At the core of the pipeline is APILlama, a model fine-tuned on the benchmark dataset to extract structured information from API documents using prompt tuning techniques. This approach minimizes trainable parameters—encoded via 20 trainable virtual tokens—and helps efficiently translate the unstructured API documentation into a predefined JSON schema, which is crucial for tool generation. The training process demonstrated that APILlama excels in generating correctly structured JSON files with a valid ratio of 97%, indicating substantial progress over baseline models in retrieving and interpreting API endpoints.

Tool-Generation and Validation

Once API information is structurally extracted, ToolFactory converts it into Python functions compatible with popular frameworks like LangChain. A critical phase is the tool validation process, where tools are tested using example parameter values, and only those passing the validation criteria are accepted for AI agents. The evaluation results highlight the importance of accurate parameter value inference, as many tools failed validation due to incorrect parameter values.

Parameter Value Inference

To enhance parameter value quality, the authors introduced a parameter database constructed from validated tools. This database enables inference of missing parameter values based on semantic similarity, leveraging domain-specific knowledge to refine the process (Figure 2). This solution addresses the frequent issue of insufficient documentation in APIs by utilizing real-world data as opposed to relying solely on LLM-generated pseudo-values. Figure 2

Figure 2: A parameter database constructed using validated tools, inferring parameter values based on semantic similarity of parameter keys and descriptions.

Case Study: Glycomaterial Research

The ToolFactory pipeline was applied to generate tools for glycomaterials research, resulting in the development of an AI agent capable of handling glycan-related tasks like searching, drawing, and format conversion. This case study validates the pipeline's versatility across scientific domains, proving it can facilitate seamless integration of scientific APIs without extensive programming efforts (Figure 3). Figure 3

Figure 3: AI Agent for Glycomaterial Research with Automated Tool Generation showcases ToolFactory's capability in simplifying database access and supporting glycan-related tasks.

Conclusion

ToolFactory marks a significant step forward in automating the generation of AI-compatible tools from REST API documentation. By translating unstructured information into structured, usable formats, ToolFactory reduces the development and learning costs associated with using APIs in AI systems. The case study in glycomaterials research underscores the pipeline’s practical benefits, enhancing scientists' ability to perform complex data integration efficiently. The work lays the groundwork for broader applications across domains, with future improvements focusing on refining parameter inference and expanding the pipeline’s adaptability to more diverse and complex API ecosystems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 3 likes about this paper.