Aether: Network Validation Using Agentic AI and Digital Twin

Published 20 Apr 2026 in cs.MA and cs.AI | (2604.18233v1)

Abstract: Network change validation remains a critical yet predominantly manual, time-consuming, and error-prone process in modern network operations. While formal network verification has made substantial progress in proving correctness properties, it is typically applied in offline, pre-deployment settings and faces challenges in accommodating continuous changes and validating live production behavior. Current operational approaches typically involve scattered testing tools, resulting in partial coverage and errors that surface only after deployment. In this paper, we present Aether, a novel approach that integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows. It features an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing. Aether agents use a unified Network Digital Twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view for verification and testing. By orchestrating agent collaboration atop this digital twin, Aether enables automated, rapid network change validation while reducing manual effort, minimizing errors, and improving operational agility and cost-effectiveness. We evaluate Aether over synthetic network change scenarios covering main classes of network changes and on past incidents from a major ISP operational network, demonstrating promising results in error detection (100%), diagnostic coverage (92-96%), and speed (6-7 minutes) over traditional methods.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents a neuro-symbolic framework combining agentic AI with a Network Digital Twin to enable intent-driven network change validation.
It demonstrates high performance with 94–100% error detection and reduced validation latency across synthetic benchmarks and real-world ISP scenarios.
The modular architecture integrates LLM-based agents and extensible verification tools, paving the way for scalable, automated network assurance.

Agentic AI and Digital Twin-Driven Network Change Validation: An Expert Analysis of Aether

Motivation and Problem Statement

Contemporary network operations suffer from fragmented, error-prone, and predominantly manual processes for network change validation. Despite progress in formal network verification, practical deployments face challenges: tool heterogeneity, lack of semantic validation, insufficient test coverage, and limited operational integration inhibit scalable, robust assurance. Aether proposes a neuro-symbolic architecture that synergistically combines Generative Agentic AI with a multi-functional Network Digital Twin (NDT) to streamline and automate network change validation end-to-end.

Figure 1: Network change validation process comprising intent analysis, impact assessment, peer review, testing, and approval.

Aether is motivated by three deficiencies in existing approaches:

Fragmentation: Incompatibility across model-based, simulation-based, and emulation-based tools creates cognitive and operational friction for operators.
Cognitive Barriers: Operators must bridge the gap between NL intent and abstract specifications in verification tools, for which existing CI/CD pipelines and static test suites are inadequate.
Agent Operational Requirements: Effective LLM-based agents face contextual and reasoning constraints requiring structured, queryable, and semantic representations of network state.

Aether’s architectural principles thus emphasize compositional intent-aware orchestration, unified semantic network representation (the NDT and Knowledge Graph), and seamless integration with NetDevOps/CI workflows.

Aether Architecture

Aether’s architecture is centered on orchestrating specialized AI agents over a unified NDT, designed for extensibility and compositional tool invocation. It leverages a set of five NetOps-specialized LLM-based agents mapped to canonical change management stages (impact assessment, test planning, test execution, etc.), coordinated by a conversational Assistant Agent. The agents interface with a structured Knowledge Graph (the Network Digital Map, NDM) abstracted atop the NDT, and leverage compositional use of heterogeneous formal verification, simulation, and differential analysis tools.

Figure 2: The Aether workflow coordinates ingestion, intent analysis, test generation, execution, and operator-in-the-loop approval.

Figure 3: Logical architecture depicts five collaborating agent roles and their interactions with the NDT and orchestration services.

Agents and Reasoning Processes

All agents follow the ReAct design pattern, supporting dynamic tool invocation, self-correction, and iterative refinement. Separation of concerns is central: each agent encapsulates domain-specific skills, memory, and interface logic. This results in high modularity and extensibility—for example, introducing new protocol-specific or troubleshooting skills is tractable at the per-agent level. The NDM Query agent is particularly critical, mapping natural language to graph queries and serving as a foundation for all higher-level orchestration.

Network Digital Twin (NDT) and NDM

Aether's NDT provides a temporal, multi-layered, OpenConfig- (and Aether-specific-) extended Knowledge Graph representing snapshots of the network’s multi-facet state—physical topology, configuration, policies, performance metrics, and computed or inferred attributes—across per-device, per-layer granularity.

Figure 4: NDM Knowledge Graph structure with explicit layering and relationship semantics between device nodes and network properties.

The NDT exposes a “git-like” workflow for managing candidate change validation, supporting branching, rebasing, and differential validation via snapshotting. Verification tools (Batfish, Routenet, custom simulators, anomaly detectors) are tightly integrated via a Model Context Protocol (MCP), decoupling LLM agent actions from the execution and interpretation of underlying checks.

Figure 5: NDT workflow for snapshot-based differential analysis and candidate validation, analogous to Git operations.

Implementation

Aether leverages modern agentic protocols (A2A, SLIM), enabling agent interoperability and secure, low-latency communication. LLM-based agents (GPT-4o) are orchestrated via a common SDK (LlamaIndex), and are exposed to tool APIs and data schemas through explicit prompt scaffolding and tool description tokenization. The NDM leverages ArangoDB for high-performance graph operations, with ingestion pipelines modularized for integration with enterprise orchestrators (e.g., NSO).

Key extensions include Batfish enhancements (e.g., SRv6, complex IS-IS scenarios) and a plug-in framework for integrating GNN-based or simulation-based performance checking tools (such as Routenet), exposing high-level verification methods for agents.

Experimental Evaluation

Synthetic Benchmarks

Aether was evaluated on eight synthetic scenarios covering policy misconfigurations, protocol logic errors (e.g., route redistribution and summarization bugs), and latent path defects—selected for operational criticality and diagnostic diversity. Each scenario includes faulty and correct candidate configuration changes, multiple NL intent formulations, and both agent-level and end-to-end system evaluation using LLM-as-a-judge metrics (Correctness, Consistency, Robustness).

Figure 6: Single agent correctness across scenarios, quantifying output agreement with expert ground truth.

Empirically:

Single-agent correctness consistently exceeds 0.7 across agents and tasks, with the NDM Query agent achieving the highest performance save for isolated schema-induced errors.
E2E error detection reaches 94% for broken candidate blocking, and 100% for main error detection in most scenarios. Precision (correct classification) varies, highlighting conservatism and the impact of data/model limitations.
Coverage and efficiency: Agent-generated testplans match or exceed ground truth requirements, with 92–96% diagnostic coverage and minimal redundancy.

Real-World Case Studies

Aether was deployed on a large-scale ISP network replica (25 routers, >30k production-equivalent configuration lines). Two historical change incidents—an IS-IS redistribution loop and an SRv6 prefix summarization blackhole—were adopted as test cases. Both involve complex, protocol-level validation uncapturable by static checks or configuration linters.

Error detection: On both incidents, Aether blocked all faulty changes (100% detection), achieved main error detection with 100% precision, and test plan coverage exceeding 91%.
Workflow latency: Average validation time is 6–7 minutes—an order-of-magnitude reduction vs. manual validation—of which >50% is attributable to verification tool execution, not agentic reasoning.
Agent weaknesses: Errors predominantly arise from query generation or schema mapping in the NDM Query Agent under complex multi-layered data; improvements are possible via skill injection and per-agent schema enrichment.

Theoretical and Practical Implications

Aether demonstrates that neuro-symbolic, agentic AI can bridge the operational chasm between NL intent and compositional, rigorous network validation. By leveraging LLM-based agents for orchestration atop a modular Network Digital Twin, Aether achieves:

Differential, intent-aware validation: Agents dynamically generate targeted tests and execute conditional checks based on NL-expressed goals, outperforming static CI/CD approaches in recognizing nuanced, business-critical faults.
Extensible verification composition: The architecture allows rapid integration of new verification backends, protocol- or domain-specific skills, and customization by operators.
Human-in-the-loop compatibility: HITL enforcement and natural language workflows facilitate operator oversight and facilitate integration in existing NetDevOps environments.
Automation with guardrails: The approach yields substantial improvements in operational agility, reducing change validation latency and error prevalence, while retaining transparency and traceability.

Limitations and Future Directions

Challenges remain in achieving full autonomy and eliminating residual false positives, which require enhanced schema understanding, richer agent skills, and tighter feedback loops from operator corrections and post-mortem failure analysis. Integration with production CI/CD pipelines and real-time anomaly detection are next frontiers, as is the operationalization of relational specification languages such as Relational NetKAT for fine-grained, intent-aligned validation.

Advances in agent skill injection, multi-agent persistent memory, and the systematic curation of domain- and protocol-specific knowledge bases will make Aether and similar frameworks more robust and amenable to large-scale deployment.

Conclusion

Aether establishes a new paradigm for automated, intent-driven network change validation by uniting agentic AI orchestration with a robust, extensible Network Digital Twin. The empirical results—94–100% error detection, high coverage, significant reduction in manual effort and validation time—demonstrate practical viability for safe network automation. The approach is generalizable, supporting rapid introduction of new agents, protocol extensions, and operational policies, and lays groundwork for further research in multi-agent network autonomy, operator-AI co-piloting, and closed-loop assurance.

Markdown Report Issue