Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning API Functionality from Demonstrations for Tool-based Agents (2505.24197v1)

Published 30 May 2025 in cs.AI

Abstract: Digital tool-based agents that invoke external Application Programming Interfaces (APIs) often rely on documentation to understand API functionality. However, such documentation is frequently missing, outdated, privatized, or inconsistent-hindering the development of reliable, general-purpose agents. In this work, we propose learning API functionality directly from demonstrations as a new paradigm applicable in scenarios without documentation. Using existing API benchmarks, we collect demonstrations from both expert API-based agents and from self-exploration. To understand what information demonstrations must convey for successful task completion, we extensively study how the number of demonstrations and the use of LLM-generated summaries and evaluations affect the task success rate of the API-based agent. Our experiments across 3 datasets and 5 models show that learning functionality from demonstrations remains a non-trivial challenge, even for state-of-the-art LLMs. We find that providing explicit function calls and natural language critiques significantly improves the agent's task success rate due to more accurate parameter filling. We analyze failure modes, identify sources of error, and highlight key open challenges for future work in documentation-free, self-improving, API-based agents.

Summary

  • The paper proposes a novel approach for API-based agents to learn functionality directly from expert demonstrations without relying on traditional documentation.
  • Experiments show that while learning from demonstrations is possible, accurate parameter filling remains a significant challenge, leading to a 39% task success rate decline when incorrect.
  • The study highlights that LLMs show potential for generating documentation from demonstrations but have limitations in synthesizing accurate functionality, pointing to the need for improved interpretative accuracy.

Learning API Functionality from Demonstrations for Tool-based Agents: An Analysis

The paper discussed delineates a novel approach to understanding API functionality without relying on customary documentation, positing that in the absence of this typical resource, demonstrations from expert agents can suffice. This premise is particularly relevant as API documentation can be unavailable, inconsistent, or outdated. The authors propose a framework where API-based agents learn directly from demonstrations, presenting a shift from traditional reliance on documentation to a paradigmatic focus on tangible usage. They systematically explore the impact of varying the number of demonstrations and altering the representation of these demonstrations, alongside using LLMs to generate documentation from demonstrations and provide natural language evaluations.

Methodology and Experimental Framework

The core of the paper’s experimental framework assesses tool-based agents across $3$ distinct datasets — WorkBench, τ\tau-Bench, and CRMArena — using $5$ unique models. The paper adopts an approach wherein demonstrations by expert API-based agents form the basis of learning; formats include direct demonstrations, documentation generated from demonstrations, both with and without example calls appended. The authors meticulously detail methodologies for processing these expert demonstrations and self-exploration experiences, emphasizing learning functionality from scratch.

Key steps in the methodology include:

  • Demonstration Collection: Extract demonstrations from expert trajectories, considering sequences of API tool calls.
  • Processing Methods: Explore $3$ methods of processing expert demonstrations and $4$ methods for updating agent understanding based on experiences.
  • Experimental Setup: Utilize benchmarks and measure task success rates across varying model configurations and demonstration numbers (NN).

Numerical Findings and Implications

The experiments reveal that API functionality learning via demonstrations is challenging. A recurrent issue is accurate parameter filling — an aspect crucial for task success and detailed in various failure modes encountered across all examined datasets. Notably, the success rate of task completion significantly drops when demonstration-derived parameter information contradicts actual function requirements, highlighting less accurate filling methods.

Quantitative results illuminate that direct demonstrations without document generation sometimes offer better success rates due to higher fidelity in action contexts, despite potential data leakage risks. Specific numerical insights include:

  • Task Success Rate Declines by $39\%” when parameter schemas are incorrectly described, stressing the importance of correct parameter understanding.
  • Self-Exploration: The inclusion of guidelines improved success rates with explicit instructions on parameter formatting proving crucial. However, challenges remain, as improper error handling leads evaluators to propagate incorrect guidance.

Additionally, the results consistently suggest that while LLMs exhibit potential, their current state presents non-trivial limitations in synthesizing accurate and effective functionality from demonstration data alone.

Future Directions and Research Opportunities

The paper opens pathways to future research in several realms:

  • Improvement in LLM Interpretative Accuracy: Enhancing how LLMs process demonstrations for higher reliability, especially concerning parameter schema.
  • Exploration of Online Learning Frameworks: Shifting from offline batch processing to iterative, dynamic updates based on ongoing task attempts.
  • Cross-Functional Parameter Knowledge Sharing: Investigating unified approaches that share parameter information efficiently across related API functions.

These avenues could profoundly augment the development of documentation-free, self-improving API-based agents, enabling real-world applicability in domains lacking stable documentation ecosystems.

Conclusion

In summary, the research provides a systematic examination of API-based agent learning from demonstrations, foregrounding notable barriers and offering insights into the capacity of LLMs to overcome documentation reliance. While promising, the findings substantiate the complexity in realizing efficient learning systems, underscoring the need for continued advancements in how agents adapt to unstructured environments with minimal guidance.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com