- The paper demonstrates that integrating domain-specific composite tools with an instruction-tuned LLM raises correctness from ~57% to ~90% while reducing token usage by threefold.
- The methodology uses a digital twin environment and rigorous experiments with four Qwen-family LLMs to compare generic HTTP tools with domain-specific atomic and composite abstractions.
- The findings imply that tool-layer abstraction design is more crucial than LLM scaling for achieving efficient, agentic control in optical network management.
Introduction and Problem Context
The evolution toward intent-based, closed-loop, and fully agentic management in optical networks is driven by standardization frameworks such as ITU IMT-2030 and ETSI ZSM. Current deployments remain at modest autonomic maturity (L2-L3), with much of the field mired in device-level or vendor-specific management approaches. The Transport API (T-API) as a vendor-neutral northbound interface remains largely underexploited for agentic management tasks. Prior art has focused on YANG-driven, NETCONF, or proprietary SDN API integration, rarely attempting true T-API-native, agentic LLM control loops.
This paper establishes a new baseline by architecting and empirically evaluating a T-API-compliant ReAct agentic loop. The central focus is the exploration of tool abstraction: contrasting generic HTTP/RESTCONF tools with atomic and composite, domain-specific T-API tool signatures. The analysis isolates tool-layer design—rather than LLM model size or T-API stack idiosyncrasies—as the primary variable influencing agent correctness, efficiency, and hallucination suppression in real-world intent-based optical network management.
The proposed architecture layers a ReAct agent loop atop a swappable tool abstraction interfacing directly with a T-API RESTCONF northbound. This encapsulation ensures that neither modifications to the base agent nor the underlying T-API domain controller are required as the abstraction layer is changed.
Three tool abstractions are benchmarked:
- Generic HTTP/RESTCONF Primitives: The agent exposes only GET and POST (read/modify) primitives, requiring the LLM to fully compose paths, construct JSON payloads, resolve identifiers, and interpret raw model trees.
- Domain-Specific Single-Call Tools: Twelve atomic tool APIs map directly to T-API endpoints and common in-process filtering steps (e.g., get_node_details, estimate_qot).
- Domain-Specific Multi-Call (Composite) Tools: Four composite APIs encapsulate complex, recurring T-API flows such as topology summarization, path computation, modulation selection cascades, and end-to-end provisioning.
This design allows a controlled experiment: the agent and underlying topology remain fixed, while only the tool abstraction is swapped.
Experimental Methodology
A digital twin (DT) environment provides the northbound T-API, integrating a GNPy-based QoT engine, NetworkX-based topology and inventory, and realistic C-band operation scenarios (CORONET CONUS topology, 75 ROADMs/198 fiber spans). The evaluation spans 10 scenario templates representing querying, analysis, provisioning, and multi-turn logical flows, with diverse path and service conditions randomized per run.
Four open, on-premises Qwen-family LLMs (from 4B to 35B parameters, including both dense and mixture-of-experts/fine-tuned for tool calling) are used as agent backends. Each scenario-tool-model permutation executes 20 statistically significant runs. Critically, an automated domain-specific oracle checks not only execution success but also answer correctness, offering nuanced failure taxonomy (zero-tool, wrong-value, wrong-modulation, missing-grounding, etc.). Mean token usage per run is also tracked as a proxy for operational cost.
Key Results
The empirical findings reveal several decisive trends:
- Correctness and Cost Plateau with Generic Abstraction: For the generic HTTP tool layer, success rates saturate at 57–58% for even the largest (32B-35B) LLMs. Token usage is high and consistent (~30–38k per run), while the smallest model (4B) collapses to 20% correctness, showing the inability of smaller models to compensate for lack of tooling structure.
- Composite Tools Enable Sharp Gains: Integrating domain-specific composite tools yields a jump in correctness (from 57% to ~90% for Qwen2.5-32B-Instruct) and a threefold reduction in token cost (down to ~10.6k per run). This demonstrates that composite tool abstraction, when paired with an instruction-tuned LLM, dominates model scaling for practical agentic control and significantly suppresses hallucinations.
- Instruction Tuning is Necessary for Composite Tool Leverage: Non-fine-tuned mid-range models (Qwen3.5-9B, 35B) do not effectively exploit composite abstractions; their correctness roughly halves compared to instruction-tuned baselines and their token usage actually increases. This reveals that tool-layer abstraction and backend model preparedness for tool calling are both required for operational viability—either alone is insufficient.
- Atomic Tools Only Partially Mitigate Complexity: Exposing atomic single-call tools reduces the LLM’s “raw protocol” burden but does not approach composite tool performance, as coordination of multi-step flows (e.g., SIP resolution, QoT cascades) remains non-trivial for all but the largest, best-finetuned models.
Implications and Theoretical Insights
The results substantiate the claim that in complex, closed-loop agentic systems controlling T-API-compliant optical infrastructures, the tool abstraction layer exerts a stronger influence on both correctness and efficiency than even substantial LLM model scaling, beyond a 14B-parameter threshold. Tool-layer design thus becomes the primary lever for practical, deployable agentic management, provided the backend LLM is adequately instruction-tuned for tool invocation.
The findings also highlight that composite tool abstraction absorbs systematic protocol logic, making agentic orchestration tractable for LLMs, while unstructured, generic primitives force a combinatorial explosion in prompt length, error probability, and token cost. Moreover, hallucination propensity is structurally suppressed by typed, pre-validated tool interfaces.
Practically, the ReAct+composite architecture can be "dropped in" to any T-API-compliant orchestration stack, increasing the portability and vendor-neutral deployment of LLM-based agentic management solutions. Theoretically, these results suggest that future LLM-agentic frameworks for network automation should prioritize co-design of tool-layer abstraction with model fine-tuning, and that standardized tool APIs could form the substrate for performance and safety benchmarking.
Future Directions
The architecture presented invites multiple extensions. One key avenue involves porting the tool layer to operate as a Model Context Protocol (MCP) server, permitting remote, wire-efficient execution while preserving domain-specific abstraction. Measuring the persistence of correctness and hallucination suppression under MCP—as opposed to direct T-API RESTCONF—would more rigorously establish the decoupling of protocol and tool-layer effects. Further, research could apply this benchmarked tool abstraction methodology to other agentic verticals beyond optical—e.g., wireless RICs, cross-domain SDN controllers.
Conclusion
This work establishes a new empirical and architectural reference for T-API-compliant, agentic closed-loop management in optical networks, demonstrating that domain-specific composite tool abstractions, together with instruction-tuned LLMs, are essential for achieving high correctness and operational efficiency. The results have major implications for standardized, vendor-neutral orchestration and suggest a concrete roadmap for generalizing the approach across network automation domains.