Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval (2403.06551v1)

Published 11 Mar 2024 in cs.IR

Abstract: Tool learning aims to extend the capabilities of LLMs with external tools. A major challenge in tool learning is how to support a large number of tools, including unseen tools. To address this challenge, previous studies have proposed retrieving suitable tools for the LLM based on the user query. However, previously proposed methods do not consider the differences between seen and unseen tools, nor do they take the hierarchy of the tool library into account, which may lead to suboptimal performance for tool retrieval. Therefore, to address the aforementioned issues, we propose ToolRerank, an adaptive and hierarchy-aware reranking method for tool retrieval to further refine the retrieval results. Specifically, our proposed ToolRerank includes Adaptive Truncation, which truncates the retrieval results related to seen and unseen tools at different positions, and Hierarchy-Aware Reranking, which makes retrieval results more concentrated for single-tool queries and more diverse for multi-tool queries. Experimental results show that ToolRerank can improve the quality of the retrieval results, leading to better execution results generated by the LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Chatcot: Tool-augmented chain-of-thought reasoning on chat-based large language models. In Findings of EMNLP, pages 14777–14790.
  2. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  3. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, pages 4171–4186.
  4. PAL: Program-aided language models. In Proc. of ICML, volume 202, pages 10764–10799.
  5. Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. arXiv preprint arXiv:2308.14034.
  6. A deep relevance matching model for ad-hoc retrieval. In Proc. of CIKM, pages 55–64.
  7. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arXiv preprint arXiv:2305.11554.
  8. Tool documentation enables zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675.
  9. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In Proc. of ICLR.
  10. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20:422–446.
  11. Dense passage retrieval for open-domain question answering. In Proc. of EMNLP, pages 6769–6781.
  12. Few-shot learning with multilingual generative language models. In Proc. of EMNLP, pages 9019–9052.
  13. WebGLM: Towards an efficient web-enhanced question answering system with human preferences. In Proc. of SIGKDD, pages 4549–4560.
  14. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  15. Rodrigo Frassetto Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085.
  16. OpenAI. 2022. OpenAI: Introducing ChatGPT.
  17. OpenAI. 2023. Gpt-4 technical report.
  18. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014.
  19. Are NLP models really able to solve simple math word problems? In Proc. of NAACL-HLT, pages 2080–2094.
  20. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
  21. Tool learning with foundation models. arXiv preprint arXiv:2304.08354.
  22. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. arXiv preprint arXiv:2307.16789v1.
  23. RocketQAv2: A joint training method for dense passage retrieval and passage re-ranking. In Proc. of EMNLP, pages 2825–2835.
  24. Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3:333–389.
  25. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  26. HuggingGPT: Solving AI tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  27. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533.
  28. End-to-end neural ad-hoc ranking with kernel pooling. In Proc. of SIGIR, pages 55–64.
  29. Idst at trec 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In Proc. of TREC.
  30. ChatGPT is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv preprint arXiv:2306.11489.
  31. React: Synergizing reasoning and acting in language models. In Proc. of ICLR.
  32. Adversarial retriever-ranker for dense text retrieval. In Proc. of ICLR.
  33. Towards robust ranker for text retrieval. In Findings of ACL, pages 5387–5401.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuanhang Zheng (8 papers)
  2. Peng Li (390 papers)
  3. Wei Liu (1135 papers)
  4. Yang Liu (2253 papers)
  5. Jian Luan (50 papers)
  6. Bin Wang (750 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com