Depth of Cascaded Tool Adaptation Hierarchies

Determine the maximal effective depth of cascaded agent-supervised tool adaptation pipelines—where tools (e.g., query reformulators, retrievers, rerankers) are trained to serve a fixed frozen large language model—such that compounding errors do not overwhelm overall system benefits, and characterize the conditions under which additional stages cease to provide net performance gains.

Background

Within the T2 paradigm, the paper describes a shift from single retrievers to multi-stage pipelines where specialized tools (such as query reformulators, retrievers, and selectors/rerankers) are trained using a frozen agent’s signals and then composed. This cascaded architecture promises separation of concerns, composability, and efficiency.

However, the authors explicitly note a fundamental unresolved question regarding how many such stages can be stacked before error propagation and misalignment between stages begin to negate benefits. Although empirical practice suggests 2–3 stages work well, the theoretical and empirical limits of hierarchy depth remain unspecified, motivating a precise determination of the depth threshold and the factors influencing it.

References

Yet this raises a fundamental question: if tools can learn from tools, which learn from frozen LLMs, how deep can this hierarchy go before compounding errors overwhelm the benefits? This question remains open, though empirical results suggest that 2-3 stages of tool adaptation (e.g., query reformulator -> retriever -> reranker) strikes a good balance.

Adaptation of Agentic AI (2512.16301 - Jiang et al., 18 Dec 2025) in Section 4.2.1 (Earlier Methods: From Proxy Signals to Structured Preferences), Synthesis: The Multi-Tool Ecosystem