Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs (2304.08244v2)

Published 14 Apr 2023 in cs.CL and cs.AI

Abstract: Recent research has demonstrated that LLMs can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Minghao Li (44 papers)
  2. Yingxiu Zhao (13 papers)
  3. Bowen Yu (89 papers)
  4. Feifan Song (14 papers)
  5. Hangyu Li (23 papers)
  6. Haiyang Yu (109 papers)
  7. Zhoujun Li (122 papers)
  8. Fei Huang (408 papers)
  9. Yongbin Li (128 papers)
Citations (101)