Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 168 tok/s Pro
GPT OSS 120B 452 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Advancing SLM Tool-Use Capability using Reinforcement Learning (2509.04518v1)

Published 3 Sep 2025 in cs.CL

Abstract: LLMs have progressed beyond simple text creation, and tool use has become increasingly important for complex, real-world tasks. Tool use in LLMs refers to their ability to utilize external resources such as APIs, databases, or software functions to extend their functionality beyond generating text.Tools are used for tasks such as performing calculations, making API calls to retrieve the current time and date, and more. This capability enables models to fetch real-time data, execute commands, or solve problems requiring dynamic interaction, making it indispensable for applications like AI agents in virtual assistants, robotic control, or automated workflows. However, while LLMs are usually adept tool use, their vast resource requirements and computation complexity restrict their use in every use case.As a result, there is an increasing need for more compact and efficient Small LLMs (SLMs). Small LLMs (SLMs) struggle in tool use compared to LLMs. As soon in Table 1. SLMs are typically trained on smaller, more specific datasets, resulting in a narrower knowledge base and limited contextual understanding compared to LLMs. This research addresses these challenges by using Reinforcement Learning (RL), specifically Group Relative Policy Optimization (GRPO), to enhance tool-use proficiency in SLMs. Unlike conventional fine-tuning approaches that require heavy computation and often lack adaptability, our method provides an efficient, effective solution that significantly boosts SLM tool-use accuracy, increasing their practical utility.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.