Papers
Topics
Authors
Recent
Search
2000 character limit reached

ToolWeaver: Scalable Tool Use Framework

Updated 5 February 2026
  • ToolWeaver is a generative framework for scalable tool use in LLMs that encodes each tool into hierarchical sequences capturing intrinsic semantics and collaborative relationships.
  • It employs logarithmic vocabulary expansion through discrete code sequences, enabling efficient multi-tool reasoning and seamless integration with large language models.
  • Its design combines semantic embedding, residual quantization, and collaborative-aware objectives to outperform traditional retrieval-based and generative tool-use pipelines.

ToolWeaver is a generative framework for scalable tool-use in LLMs that encodes each tool as a hierarchical sequence of discrete codes, enabling both efficient vocabulary expansion and the representation of collaborative semantic relationships among tools. Developed to address the dual semantic limitations of retrieval-based and existing generative tool-use pipelines—specifically, the inability to capture intricate intrinsic semantics and co-usage patterns—the ToolWeaver framework achieves logarithmic growth in vocabulary size with respect to the number of supported tools. This structured code approach facilitates more generalizable, efficient, and semantically-aware multi-tool reasoning for advanced AI agents (Fang et al., 29 Jan 2026).

1. Motivation and Semantic Challenges

Prevalent retrieval-based LLM tool-use architectures leverage external retrievers, such as BM25 or dense encoders, to select relevant tools from large libraries D={d1,,dN}D=\{d_1,\ldots,d_N\}. However, these systems are constrained by:

  • Under-representation of Intrinsic Semantics: Encoders typically fail to capture the nuanced, functional meaning of tools.
  • Absence of Extrinsic Tool Knowledge: LLMs, pretrained solely on natural language corpora, have no a priori understanding of external tool APIs, leading to gaps in multi-tool reasoning and composition.

Standard generative methods that map each tool did_i to a unique token tooli\langle\mathrm{tool}_i\rangle prompt further scalability, generalization, and semantic bottleneck issues. As the tool set expands (N47000N\approx47\,000 in ToolBench), vocabulary size increases linearly (O(N)O(N)), with each tool treated as semantically isolated, impeding generalization and collaborative usage modeling. This imposes performance and resource constraints on LLMs and hinders learning of tool relationships.

2. Hierarchical Code Sequence Representation

ToolWeaver replaces the atomic tool tokenization scheme with a sequence of LL discrete codes for each tool. The code sequence for tool dd is defined as

sd=[d,1,d,2,,d,L]s_d = [\ell_{d,1}, \ell_{d,2}, \ldots, \ell_{d,L}]

where each d,l{1,,K}\ell_{d,l}\in \{1,\dots,K\} indexes into the llth codebook ClC_l. The total number of new tokens introduced is LKL \cdot K, offering exponential encoding capacity: KLNK^L \geq N, where NN is the number of tools.

The crucial property of this design is logarithmic vocabulary expansion:

L=lnNlnKL = \left\lceil \frac{\ln N}{\ln K}\right\rceil

so the number of additional tokens scales as O(lnN)O(\ln N). This stands in contrast to the O(N)O(N) growth of monolithic token schemes.

By embedding both intrinsic semantics (functional similarity) and extrinsic co-usage patterns (collaborative relationships captured via cosine similarities on tool usage), ToolWeaver encodes tools such that jointly-used tools share early code prefixes, directly supporting multi-tool reasoning and efficient generalization.

3. Collaborative-Aware Structured Tokenization

The code sequence assignment follows an explicit collaborative-aware residual quantization procedure:

  1. Semantic Embedding: For each tool, a text encoder computes ede_d from documentation.
  2. Projection: zd=Wedz_d = W \cdot e_d, projecting embedding into a lower-dimensional space.
  3. Residual Initialization: rd,1=zdr_{d,1} = z_d.
  4. Codebook Quantization: For each level ll (1lL1 \le l \le L):

    • Assign code index:

    d,l=argmink[1,K]rd,lvl,k2\ell_{d,l} = \arg\min_{k\in[1,K]} \| r_{d,l} - v_{l,k} \|^2

- Update residual:

rd,l+1=rd,lvl,d,lr_{d,l+1} = r_{d,l} - v_{l,\ell_{d,l}}

  1. Collaborative-Aware Objective: Optimize codebooks {vl,k}\{v_{l,k}\} to minimize:

    L=Lrecon+Lquant+Lcollab\mathcal{L} = \mathcal{L}_\mathrm{recon} + \mathcal{L}_\mathrm{quant} + \mathcal{L}_\mathrm{collab}

    with

    Lcollab=λu<vAu,vsusv2\mathcal{L}_\mathrm{collab} = \lambda\sum_{u<v} A_{u,v}\|s_u-s_v\|^2

    where Au,vA_{u,v} encodes normalized co-occurrence.

  2. Conflict Mitigation: At the final codebook level, a balanced assignment via the Sinkhorn–Knopp algorithm ensures uniform tool distribution among code indices, averting index collisions.

This process tightly weaves semantic and collaborative signals into tool code assignments.

4. Generative Alignment and Model Integration

Each code d,l\ell_{d,l} is materialized as a new special token Tl,\langle T_{l,\ell}\rangle whose embedding is randomly initialized and fine-tuned during alignment. The alignment is conducted in two stages:

  • Tool Retrieval Alignment: The LLM minimizes negative log-likelihood over code sequences sds_d conditioned on tool queries qq:

Lretrieval=E(q,d)logP(sdq)\mathcal{L}_\mathrm{retrieval} = -\mathbb{E}_{(q, d)}\log P(s_d | q)

  • Tool Usage Trajectory Alignment: Training continues on full multi-step tool usage trajectories via autoregressive cross-entropy loss:

Lusage=t=1TlogP(ctc<t,context)\mathcal{L}_\mathrm{usage} = \sum_{t=1}^T -\log P(c_t | c_{< t},\, \mathrm{context})

During inference, beam search with trie constraints efficiently restricts decoding to valid code sequences, ensuring only legitimate tool identifiers are generated.

5. Empirical Performance and Analysis

ToolWeaver was evaluated on ToolBench, comprising approximately 4700047\,000 real-world APIs, using standard splits: I1 (single-tool), I2 (multi-tool/one category), I3 (multi-tool/multi-category), and generalization splits ("Tool.", "Cat.").

Retrieval Efficacy

Method I1@1 I3@1 I3@5
BM25 26.92 10.00 12.33
EmbSim 50.50 18.00 20.94
ToolRetriever 75.92 28.00 44.54
ToolGen 88.50 81.00 85.83
ToolWeaver 91.16 88.00 90.12

ToolWeaver substantially outperforms all retrieval and generative baselines in NDCG@k, particularly for complex multi-tool queries (I3).

End-to-End Tool Use

Method I3 SoPR I3 SoWR
ToolGen 36.34 45.56
ToolWeaver 52.19 59.02

On end-to-end metrics, ToolWeaver's solvable pass rate and win rate notably exceed those of prior work, especially on tasks requiring cross-category multi-tool composition.

Ablation and Tokenization Comparisons

Ablations confirm the necessity of both semantic initialization and collaborative-aware code assignment (collaborative weight λ=1\lambda = 1 maximizes NDCG). Static tree, atomic, numerical, or semantics-only tokenizations underperform the full collaborative approach.

Language Modeling Preservation

Model WikiText-2 PPL CNN/DM BERTScore
Llama-3-8B 6.34 0.8535
ToolGen 104.54 0.8293
ToolWeaver 25.36 0.8507

Unlike linear vocabulary expansion, which degrades perplexity and summarization quality, ToolWeaver's compact approach notably preserves core language modeling capabilities.

Inference Efficiency

Despite the multi-token output, inference remains fast (<200ms<200\,\mathrm{ms} per call for L=4L=4 on A100) and memory efficient (15.1 GB vs. 15.8 GB for ToolGen).

6. Limitations and Future Directions

Potential limitations include increased autoregressive error rates for longer code sequences (L>4L>4), due to error propagation during generation. The residual quantization process is currently unsupervised; incorporation of reinforcement learning from real tool-use feedback is proposed as a future enhancement for refining code assignments. A plausible implication is that supervised or RL-based code adaptation could further improve generalization and downstream agent performance.

ToolWeaver establishes a hierarchical, collaborative-aware tokenization paradigm for tool-augmented LLMs, supporting scalable addition of external functionalities while preserving semantic structure and language modeling competence. Code and data are available at https://github.com/Fwibo/ToolWeaver (Fang et al., 29 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ToolWeaver.