ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution

Published 15 Apr 2026 in cs.CL | (2604.13787v1)

Abstract: LLMs enhance their problem-solving capability by utilizing external tools. However, in open-world scenarios with massive and evolving tool repositories, existing methods relying on static embedding retrieval or parameter memorization of tools struggle to align user intent with tool semantics or generalize to unseen tools, respectively, leading to suboptimal accuracy of open-world tool retrieval and execution. To address these, we present ToolOmni, a unified agentic framework that enables LLMs for open-world tool use by proactive retrieval and grounded execution within a reasoning loop. First, we construct a cold-start multi-turn interaction dataset to instill foundational agentic capabilities via Supervised Fine-Tuning (SFT). Then, we introduce open-world tool learning based on a Decoupled Multi-Objective GRPO algorithm, which simultaneously optimizes LLMs for both tool retrieval accuracy and execution efficacy in online environments. Extensive experiments demonstrate that ToolOmni achieves state-of-the-art performance both in retrieval and execution, surpassing strong baselines by a significant margin of +10.8% in end-to-end execution success rate, while exhibiting exceptional robustness and generalization capabilities.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents ToolOmni, a unified agentic framework that decouples proactive retrieval from grounded execution to address open-world tool use challenges.
It implements a two-stage training process combining supervised fine-tuning with GRPO-based reinforcement learning, achieving 78.29% NDCG and 54.13% SoPR.
The approach demonstrates robust generalization to unseen tools and adversarial noise through iterative, decoupled optimization, outperforming existing retrieval and execution methods.

ToolOmni: Agentic Learning for Open-World Tool Use via Proactive Retrieval and Grounded Execution

Motivation and Problem Landscape

The proliferation of large-scale, evolving tool repositories has rendered traditional approaches for tool-use reasoning by LLMs—including static embedding-based retrieval and parameter memorization—insufficient for open-world scenarios. Embedding retrieval models exhibit low accuracy due to superficial semantic matching when faced with massive tool sets, while parametric memory techniques are inflexible, cannot generalize to unseen tools, and require costly retraining upon repository updates. The paper introduces ToolOmni, a unified agentic framework that decouples proactive retrieval and grounded execution within an iterative reasoning loop to robustly enable open-world tool use.

Figure 1: Motivation for ToolOmni—embedding retrieval and parametric memory fail for massive and evolving toolsets, whereas ToolOmni couples proactive retrieval and grounded execution for effective open-world tool use.

Framework Architecture

ToolOmni operates in two distinct phases. During Proactive Retrieval, the agent iterates with a retrieval server, dynamically curating relevant tool candidates via multi-turn interactions. This allows contextual query adaptation, efficient search refinement, and task-specific tool selection. In the Grounded Execution phase, the agent leverages the finalized tool set for reasoning and sequential tool invocation, generating answers grounded in the retrieved documentation. The decoupling facilitates independent reward optimization and rigorous process supervision, mitigating the interference between retrieval and execution reward signals.

Figure 2: ToolOmni pipeline—Proactive Retrieval iteratively curates tool candidates; Grounded Execution reasons with retrieved tools to solve the task.

Learning Objectives and Optimization Strategy

The framework implements a two-stage training process:

Cold-Start Learning: Supervised fine-tuning (SFT) using a hybrid dataset of retrieval and execution trajectories instills foundational agentic capabilities in the model.
Open-World Tool Learning: An enhanced Group Relative Policy Optimization (GRPO) algorithm enables synchronized optimization for retrieval and execution via decoupled, task-specific reward functions. Retrieval rewards enforce format correctness, recall, and conversion; execution rewards enforce structural and answer correctness. Group-relative advantage estimation and token-level masking ensure stable optimization dynamics, with trajectory filtering gating execution rollouts solely on valid retrievals.

Empirical Results and Strong Numerical Claims

ToolOmni is evaluated on the ToolBench benchmark, comprising over 16,000 APIs across varying complexity and generalization splits. Key empirical findings include:

State-of-the-art retrieval performance: ToolOmni achieves an average NDCG of 78.29% in multi-domain settings, significantly surpassing baselines such as ToolRetriever and ToolGen in top-1 and top-3 accuracy, demonstrating superior capacity for pinpointing golden tools in massive repositories.
End-to-end execution superiority: ToolOmni's average Solvable Pass Rate (SoPR) reaches 54.13% in realistic scenarios, outperforming GPT-3.5 pipeline baselines by +11.78% and unified generative models by +18%. Solvable Win Rate (SoWR) exhibits substantial improvements as well, indicating enhanced precision and coherence in answer generation.
Robust generalization: ToolOmni maintains SoPR of 52.20% on unseen tools and 55.95% on category generalization, exceeding GPT-3.5 by +13.85%, evidencing strong transfer and meta-reasoning capabilities.

Robustness, Ablations, and Sensitivity Analysis

Robustness tests under adversarial retrieval noise and generalization splits reveal the adaptive reasoning capacities of ToolOmni, which leverages alternative functionally similar tools when ground-truth candidates are occluded. Ablation studies underscore the necessity of both SFT and RL: RL drives retrieval quality, SFT is crucial for execution robustness. The iterative retrieval mechanism yields an absolute gain of +4.5% NDCG@5 over static baselines. Decoupled multi-objective GRPO with trajectory filtering and separated gradient updates is pivotal for execution performance, avoiding destructive signal conflicts and hallucination shortcuts.

Figure 3: Ablation of training stages—integrating SFT and RL achieves optimal performance.

Figure 4: Robustness against adversarial retrieval noise on complex open-world tasks.

Figure 5: RL components ablation—decoupled optimization and trajectory filtering are essential for execution success.

Figure 6: Sensitivity analysis—optimal format reward weighting enhances execution without stifling complex reasoning.

Practical and Theoretical Implications

Practically, ToolOmni enables scalable, robust tool reasoning in real-world, dynamic API landscapes, optimizing computational efficiency by trading inexpensive retrievals for fewer, higher-quality API invocations. Theoretically, the decoupling of retrieval and execution task signals via separate reward functions and optimization offers a powerful schema for RL fine-tuning in highly interdependent LLM scenarios. The agentic iterative design transcends passive pipeline architectures, learning transferable meta-skills for tool use, parameter inference, and error recovery that adapt seamlessly to novel domains.

Future Directions

Future efforts should extend ToolOmni’s framework to multimodal, heterogeneous toolsets and investigate scaling properties with larger foundation models. The rigidity of the cascaded paradigm could be relaxed to support dynamic tool chain discovery conditioned on intermediate execution feedback, further enhancing agentic flexibility and long-horizon planning.

Conclusion

ToolOmni represents a unified agentic approach for open-world tool use, integrating proactive retrieval and grounded execution with decoupled RL optimization. It delivers state-of-the-art retrieval and execution performance, robust generalization, and resilience to noise in massive, evolving repositories. The implications are substantial for scalable, adaptive LLM tool use and agentic reasoning in real-world dynamic environments.

Markdown Report Issue