Reimagining RAN Automation in 6G: An Agentic AI Framework with Hierarchical Online Decision Transformer

Published 5 Apr 2026 in cs.NI | (2604.03908v1)

Abstract: In this paper, we propose an Agentic AI framework for wireless networks. The framework coordinates a pool of AI agents guided by Natural Language (NL) inputs from a human operator. At its core, the super agent is powered by a Hierarchical Online Decision Transformer (H-ODT). It orchestrates three categories of agents: (i) inter-slice, intra-slice resource allocation agents, (ii) network application orchestration agents, and (iii) self-healing agents. The orchestration takes place with the help of an Agentic Retrieval-Augmented Generation (RAG) module that integrates knowledge from heterogeneous sources. In this proposed methodology, the super agent directly interfaces with operators and generates sequential policies to activate relevant agents. The proposed framework is evaluated against three state-of-the-art baselines, showing improved throughput, reduced network delay, and higher energy efficiency at both slice-level and system-wide performance metrics. Also, the proposed Agentic framework introduces a bi-level human operator intent validation methodology, both at the slice-level and Key Performance Indicator (KPI)-level using generative AI-based time series predictors. We could rule out performance-degrading operator intents with an accuracy of 88.5%. Lastly, while being interrupted by any performance-degrading events, the self-healing capability of Agentic AI in our framework automatically recovers 90% of its previous performance, avoiding quality-of-service drifts when there is no human involvement.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces a super-agent architecture leveraging a Hierarchical Online Decision Transformer to orchestrate multi-agent RAN control with 88.5% intent validation accuracy.
It employs bi-level intent validation and DRL-based scheduling to dynamically manage heterogeneous 6G traffic, achieving up to 32.9% throughput gains and 60.9% delay reduction.
A-RAG and self-healing agents enable zero-touch, resilient RAN management, recovering ~90% of nominal performance after disturbances.

Agentic AI-Driven RAN Automation for 6G: A Hierarchical Online Decision Transformer Framework

Introduction and Motivation

This work presents a comprehensive Agentic AI framework for intent-driven radio access network (RAN) automation, targeting the operational complexities and dynamic requirements of 6G systems (2604.03908). Traditional approaches to network management, which rely on static policies, rule-based heuristics, or single-agent AI controllers, lack the flexibility and autonomy demanded by future networks characterized by heterogeneous services (eMBB, URLLC, BE), stringent SLA compliance, and the necessity for zero-touch operation. Recent advances in LLMs and generative AI underscore the potential of integrating semantic reasoning, natural language input, and modular multi-agent architectures for closed-loop, autonomous network management.

The proposed framework operationalizes a super-agent architecture powered by Hierarchical Online Decision Transformer (H-ODT), orchestrating an array of specialized network agents responsible for inter- and intra-slice resource allocation, network function execution, and self-healing. This super-agent leverages an Agentic Retrieval-Augmented Generation (A-RAG) module, fusing multi-source historical and real-time knowledge with contextual reasoning.

System Architecture and Agentic Design

The system model is a mmWave massive MIMO RAN serving multiple slices, each with its specialized QoS and traffic characteristics (eMBB, URLLC, BE). Network control is disaggregated using an Open RAN-inspired approach, separating strategic intent management (non-real-time) from near-real-time orchestration.

At the core of the Agentic design is a super-agent interfacing directly with human operators via natural language and autonomously generating multi-step policies that activate subordinate agents:

Agentic RAG (A-RAG) retrieves and synthesizes evidence plans from real-time analytics, standards corpora, and intent histories.
Inter- and intra-slice agents perform DRL-based resource arbitration and fine-grained scheduling, incorporating real-time traffic mix, buffer states, and instantaneous SINR measurements.
RAN application orchestration agent coordinates DRL-driven network functions such as traffic steering, cell sleeping, power control, and beamforming.
Self-healing agents include an RL-based controller for slice prioritization under QoS drift and a supervised module mapping multi-KPI anomalies to optimal application triggers.
Figure 1: The agentic architecture delineates super-agent policy orchestration over reasoning, control, and self-healing agents, enforcing separation of concerns and modular activation.

Figure 2: The proposed agentic solution encompasses interface layers for operator intents, retrieval, validation, and hierarchical orchestration, supporting closed-loop automation.

Hierarchical Online Decision Transformer (H-ODT) Policy

The H-ODT module extends conventional offline Decision Transformers by supporting online finetuning, stochastic goal-aware exploration, and meta-control. Policy generation is formulated as a sequence-modeling problem, where goals are derived from operator intents mapped into target KPI deltas. The meta-transformer identifies historical action contexts that have previously maximized task fulfillment, passing these as conditioning tokens to the low-level decision transformer. This architecture enables dynamic adaptation to non-stationary traffic, intent drift, and anomalous events—a critical requirement for autonomous RANs where traffic patterns, user density, and fault conditions are highly unpredictable.

Figure 3: The H-ODT architecture employs a bi-level transformer, with meta-level context facilitating guided agent activation and continual policy refinement.

Intent Validation: Multi-Level Predictive Assurance

A notable methodological advance is the bi-level intent validation framework, encompassing both slice-aware and KPI-centric mechanisms:

Slice-aware validation: Predicts imminent traffic-class dominance over configurable short horizons, invalidating intents that conflict with forecasted demand surges or would starve critical traffic slices.
KPI-centric validation: Employs a suite of time-series predictors (Informer, Autoformer, Mamba4Cast) selected adaptively for KPI statistical profile. Predictive forecasts enable rejection of intents likely to degrade throughput, increase delay, or violate energy constraints based on empirical feasibility lookup tables.
Figure 4: Comparison of time-series predictors (Autoformer, Informer, Mamba) for KPI trajectory forecasting; selection tailors validation accuracy to signal regime.

Figure 5: Real-time validation of operator NLI based on forecasted QoS drift enables the framework to reject unsafe or performance-degrading policies.

Empirical results demonstrate an 88.5% accuracy in detecting and rejecting performance-degrading operator intents before enactment, quantifiably reducing throughput instability, unnecessary energy expenditure, and SLA violations.

Figure 6: KPI traces (throughput, delay, efficiency) with and without intent validation; post-validation curves retain smooth, bounded operation.

Agentic Orchestration and Self-Healing

The proposed policy orchestrator demonstrates event-driven agent activation, rather than continuous control, minimizing orchestration overhead while supporting scalability.

Figure 7: Agents are invoked selectively across orchestration epochs, as evidenced by event scatter and binary activation visualizations.

System-wide evaluations show robust advances in core KPIs:

Throughput: Up to 32.9% gain versus non-Agentic/learning-based baselines.
Delay: Up to 60.9% reduction.
Energy Efficiency: Up to 3× improvement.

These gains persist across system-level and slice-specific metrics (eMBB, URLLC, BE), with superior tail-user reliability and goal adherence under increasing load and user density.

Figure 8: The Agentic framework consistently outperforms baselines in throughput, efficiency, delay, tail-user service, and deviation from operator-prescribed goals.

Autonomous Recovery: Agentic Self-Healing

A key contribution of this framework is robust zero-touch recovery in the face of performance collapse or anomalous events, in the absence of operator supervision. Self-healing agents, jointly using KPI analysis and past application-effect mappings, autonomously reallocate resources or trigger network functions to restore performance.

Under more than 120 injected hard-faults (e.g., traffic surges, cell failures), the system automatically recovers ~90% of the nominal pre-degradation performance (by throughput, efficiency, and delay), with minimal residual loss.
Figure 9: Following disturbance, self-healing mechanisms promptly restore energy efficiency and URLLC latency to baseline levels.

Figure 10: Scatter of pre- and post-recovery KPI pairs for throughput and energy efficiency, with most recovery events residing above the 90% effectiveness threshold.

Practical and Theoretical Implications

The research evidences that modular, multi-agent architectures orchestrated by H-ODT enable goal-oriented, adaptive, and highly resilient RAN operation. The use of fine-tuned LLMs (IA $^3$ -LLaMA), predictive multi-level intent validation, and online meta-sequence modeling enables both semantic grounding of control policies and robust recovery under unanticipated operating regimes. The findings advocate for greater architectural modularity and hierarchically structured (meta-)policy search in future AI-native mobile networks, where per-slice and per-application specialization is essential for meeting 6G heterogeneity.

From a practical viewpoint, this framework provides a pathway for scalable zero-touch RAN management, reducing the operational burden on network administrators and enabling SLA-compliant performance even in the absence of human oversight. The use of zero-shot/few-shot natural language intent handling further democratizes network control, broadening the domain expertise required for day-to-day operation.

Future Outlook

Potential directions informed by this research include the extension to more diverse agent pools (e.g., direct RIC-as-agent integration), federated or decentralized agentic learning (to reduce centralization bottlenecks), and incorporation of model-based planning within the H-ODT rollout loop to further enhance sample efficiency and explainability. Safety, transparency, and adversarial robustness remain ongoing research challenges, particularly as such agentic frameworks become critical infrastructure in autonomous 6G systems.

Conclusion

This work establishes an advanced, modular AI paradigm for intent-driven and resilient 6G RAN management. Leveraging stateful semantic reasoning, hierarchical online sequence modeling, and multi-level autonomous recovery, the proposed framework achieves both statistically significant KPI gains and robust zero-touch operation. These results underscore agentic architectures with meta-decision intelligence as a foundational principle for next-generation communication infrastructure.