Emergent Mind

Planning and Editing What You Retrieve for Enhanced Tool Learning

(2404.00450)
Published Mar 30, 2024 in cs.CL , cs.AI , cs.IR , and cs.LG

Abstract

Recent advancements in integrating external tools with LLMs have opened new frontiers, with applications in mathematical reasoning, code generators, and smart assistants. However, existing methods, relying on simple one-time retrieval strategies, fall short on effectively and accurately shortlisting relevant tools. This paper introduces a novel PLUTO (Planning, Learning, and Understanding for TOols) approach, encompassing Plan-and-Retrieve (P&R) and Edit-and-Ground (E&G) paradigms. The P&R paradigm consists of a neural retrieval module for shortlisting relevant tools and an LLM-based query planner that decomposes complex queries into actionable tasks, enhancing the effectiveness of tool utilization. The E&G paradigm utilizes LLMs to enrich tool descriptions based on user scenarios, bridging the gap between user queries and tool functionalities. Experiment results demonstrate that these paradigms significantly improve the recall and NDCG in tool retrieval tasks, significantly surpassing current state-of-the-art models.

Comparison of PLUTo and conventional paradigms showing PLUTo's superior query parsing and tool retrieval.

Overview

  • PLUTo integrates planning and editing paradigms to enhance tool retrieval and description in LLMs, addressing complex user queries effectively.

  • The Plan-and-Retrieve (P&R) paradigm decomposes queries, retrieves relevant tools, and evaluates tool effectiveness, enhancing retrieval accuracy.

  • The Edit-and-Ground (E&G) paradigm optimizes tool descriptions to align with user needs, leveraging LLM world knowledge for description enrichment.

  • PLUTo outperforms state-of-the-art models in tool retrieval tasks, indicating significant improvements in recall and NDCG metrics.

Enhanced Tool Learning in LLMs Through Planning and Editing

Introduction to PLUTo

The integration of external tools with LLMs extends the functionality of AI applications into new domains like mathematical reasoning and smart assistants. Traditional methods rely on one-time retrieval strategies that often fail to consider the dynamism of real-world queries, resulting in a gap between the user's needs and the tools retrieved. To bridge this, the study introduces a novel framework, PLUTo (Planning, Learning, and Understanding for Tools), incorporating two paradigms: Plan-and-Retrieve (P&R) and Edit-and-Ground (E&G). These paradigms collectively aim to enhance the retrieval and utility of tools in responding to complex user queries.

Plan-and-Retrieve (P&R) Paradigm

The P&R paradigm is a neural-network-based approach that employs a query planner and a retrieval module. This paradigm operates in three stages:

  1. Decomposition: The query planner decomposes complex user queries into more manageable sub-queries.
  2. Retrieval: For each sub-query, the retriever module shortlists relevant tools from a pool of candidates.
  3. Evaluation: The effectiveness of selected tools is continuously evaluated, adjusting the planning strategy to enhance retrieval accuracy.

Edit-and-Ground (E&G) Paradigm

The E&G paradigm improves tool descriptions to better match their functionalities with user scenarios. It utilizes user query context and LLMs' world knowledge to optimize tool descriptions, making them more informative and aligned with real-world applications. This process involves:

  1. Evaluation of Existing Descriptions: Identifying under-informative tool descriptions based on retrieval performance.
  2. Optimization: Leveraging LLM capabilities to generate enriched tool descriptions that detail functionalities in relation to user scenarios.

Key Results

The implementation of the PLUTo approach yielded significant improvements in tool retrieval tasks, outperforming current state-of-the-art models. Experiments demonstrated heightened recall and normalized discounted cumulative gain (NDCG), indicating a more effective and accurate tool retrieval process. Furthermore, downstream evaluation suggested improvements in response accuracy and relevance, highlighting PLUTo's ability to address complex queries successfully.

Practical and Theoretical Implications

PLUTo offers several advancements in the field of LLMs and tool integration, including:

  • Demonstrating the efficacy of planning and editing paradigms in enhancing tool retrieval.
  • Showcasing the flexibility of PLUTo in adapting to different retrieval engines.
  • Highlighting the potential of LLMs in automating and enriching tool descriptions based on real-world user scenarios.

Future Perspectives

While PLUTo marks a significant step forward, future research may focus on several areas:

  • Extending the PLUTo framework to multilingual settings to broaden its applicability.
  • Exploring further optimization techniques within the E&G paradigm to enhance tool descriptions continually.
  • Investigating the integration of PLUTo in more specialized domains such as healthcare or legal services, potentially unlocking new uses for LLM-enhanced tool learning.

Conclusion

The research introduces and validates PLUTo, a novel framework that significantly enhances tool learning in LLMs. By integrating the P&R and E&G paradigms, PLUTo not only improves the retrieval of relevant tools but also ensures that the tools' descriptions are optimized for practical applications. As a result, this framework stands as a promising advancement in the integration of LLMs with external tools, offering improved effectiveness and adaptability across various applications.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  2. Reading wikipedia to answer open-domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pages 1870–1879. Association for Computational Linguistics (ACL).
  3. Tanmay Gupta and Aniruddha Kembhavi. 2023. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962.
  4. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  5. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings
  6. Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
  7. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  8. Active Retrieval Augmented Generation
  9. Genegpt: Augmenting large language models with domain tools for improved access to biomedical information
  10. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  11. Dspy: Compiling declarative language model calls into self-improving pipelines
  12. Label efficient semi-supervised conversational intent classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 96–102, Toronto, Canada. Association for Computational Linguistics.
  13. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy. Association for Computational Linguistics.
  14. Retrieval-augmented generation for knowledge-intensive nlp tasks
  15. Api-bank: A comprehensive benchmark for tool-augmented llms
  16. Chameleon: Plug-and-play compositional reasoning with large language models
  17. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In The Eleventh International Conference on Learning Representations.
  18. Interface evolution patterns: balancing compatibility and extensibility across service life cycles. pages 1–24.
  19. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  20. Webgpt: Browser-assisted question-answering with human feedback
  21. Gorilla: Large Language Model Connected with Massive APIs
  22. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models
  23. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
  24. Toolformer: Language Models Can Teach Themselves to Use Tools
  25. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face
  26. Dialog2API: Task-Oriented Dialogue with API Description and Example Programs
  27. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009.
  28. Restgpt: Connecting large language models with real-world restful apis
  29. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases
  30. MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
  31. LLaMA: Open and Efficient Foundation Language Models
  32. Visual chatgpt: Talking, drawing and editing with visual foundation models
  33. On the Tool Manipulation Capability of Open-source Large Language Models
  34. Exploring continual learning for code generation models. In The 61st Annual Meeting Of The Association For Computational Linguistics.
  35. Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
  36. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
  37. Making retrieval-augmented language models robust to irrelevant context
  38. Syntax error-free and generalizable tool use for llms via finite-state decoding
  39. Furthest Reasoning with Plan Assessment: Stable Reasoning Path with Retrieval-Augmented Large Language Models

Show All 39