AI-on-RAN: Convergence of AI and RAN
- AI-on-RAN is a unified concept merging RAN protocols with AI workloads, allowing telecom infrastructure to host diverse, high-performance applications.
- It leverages cloud-native microservices and GPU slicing to concurrently support time-sensitive RAN functions and compute-intensive AI tasks.
- By enabling joint orchestration of compute and network resources, AI-on-RAN meets stringent 6G SLAs and enhances service agility across verticals.
Artificial Intelligence-powered Radio Access Network (AI-RAN) redefines the architectural and operational principles of radio access networks by merging high-performance computing and programmable communication infrastructure into a unified, cloud-native fabric. AI-RAN is characterized by the intrinsic co-location of RAN protocol stacks and AI workloads (training and inference) on shared, dynamically orchestrated edge and data center platforms. This convergence targets the dual challenge faced by next-generation (6G) networks: achieving stringent service-level objectives (ultra-low latency, high throughput, adaptive management) and simultaneously maximizing the utilization and flexibility of compute/network assets across both telecommunications and AI-intensive vertical services (Kundu et al., 15 Jan 2025).
1. Core Paradigms of AI-RAN
AI-RAN is realized through three interlinked paradigms, each reflecting a distinct mode of convergence between communications and AI:
- AI-for-RAN deploys AI techniques directly into RAN operations, optimizing protocol layers, orchestration, and management. Key applications include beamforming, channel estimation, interference mitigation at the physical layer, adaptive modulation, coding, dynamic scheduling, load balancing, and intent-driven zero-touch automation at higher protocol layers (e.g., O-RAN RICs and 3GPP AI enhancements).
- AI-on-RAN enables RAN infrastructure to host AI-driven applications and vertical services. Examples include LLM-powered chatbots, digital humans, video analytics for smart factories, XR, and AI-as-a-service (AIaaS) paradigms that combine generative AI, cloud compute, and mobile connectivity, instantiated on telco-specific LLMs (e.g., GTAA models).
- AI-and-RAN facilitates dynamic, fine-grained sharing of hardware resources (notably multi-instance GPUs) between time-sensitive RAN functions (vCU, vDU, RU) and demanding AI workloads. GPU-based orchestration platforms (e.g., Aarna AMCOP, SoftBank AITRAS) and techniques such as Multi-Instance GPU (MIG) slicing enable simultaneous, isolated execution of RAN and AI tasks on shared accelerators (Kundu et al., 15 Jan 2025).
2. Architectural Requirements and Enablers
The practical realization of AI-RAN relies on innovations across system hardware, cloud-native software principles, and orchestrated resource management:
- Accelerated Computing Infrastructure: Massively parallel general-purpose GPUs, high-speed, low-latency interconnects (NVLink, PCIe Gen5/6), and support for both offline AI training (centralized clouds) and online inference (edge nodes) are foundational for converged compute-communicate tasks.
- Cloud-Native, Software-Defined Design: RAN functions and AI microservices are decomposed into highly portable, containerized microservices orchestrated by Kubernetes or comparable container-as-a-service (CaaS) layers. CI/CD pipelines ensure rapid feature deployments decoupled from hardware lifecycle, and the system’s elasticity is managed in response to both real-time traffic fluctuations and AI demand surges.
- Joint Orchestration of Compute and Network Resources: A unified, programmable orchestrator is responsible for coordinated scheduling of CPUs, GPUs, memory, and network resources, as well as spectrum/time-slot allocation. Orchestration leverages AI-embedded control loops for adaptive, SLA-aware placement of workloads—applying, for example, GPU slicing to optimize multi-tenant utilization.
- Native AI Support and Network Digital Twins: AI algorithms are embedded at every network layer and RRM domain, feeding from real-time data pipelines. The introduction of a Network Digital Twin (NDT) provides a safe environment for AI model training, validation, and “what-if” analysis, reducing operational risk before in-situ deployment (Kundu et al., 15 Jan 2025).
3. Reference Architecture and Workflow
AI-RAN’s reference architecture is structured as a two-layer (spine–leaf) datacenter topology:
- Compute Fabric (East-West Traffic): Connects radio units (RUs) to AI-RAN servers equipped with CPUs, multi-instance GPUs, DPUs, and fast local SSDs. The fabric aggregates fronthaul traffic through transport routers and distributes it among AI-RAN servers that may host DUs, combinations of DUs and CUs, or fully virtualized RAN and CN stacks.
- Converged Fabric (North-South Traffic): Handles midhaul/backhaul and internet connectivity, integrating with classical telco datacenter topologies.
- Software Stack (as detailed in Figure 1 of (Kundu et al., 15 Jan 2025)): Comprises a Kubernetes-driven cloud OS, platform APIs for compute, networking, and orchestration (CUDA, DOCA), a Service Management and Orchestration (SMO) stack controlling vCU/vDU/CN microservices, and AI components (cluster agents, SDKs, serverless APIs). The E2E orchestrator synchronizes RAN and AI agents for resource allocation and scheduling.
4. Performance Evaluation and Proof-of-Concept
AI-RAN performance is measured by throughput (T), latency (L), and resource utilization (U):
A proof-of-concept on NVIDIA Grace-Hopper MGX GH200 servers, each with two GPUs partitioned via MIG, demonstrates:
- RAN-only GPU utilization peaks at ~40%, indicating significant idle resource under legacy one-service-per-node provisioning.
- AI-only instances exhibit utilization spikes up to ~60%, depending on LLM inference load.
- Combined (AI+RAN) execution achieves a mean GPU utilization of ~41% and maximum ~76%, effectively doubling asset efficiency and empirically confirming the ability to concurrently support carrier RAN SLAs and AI inference demands (Kundu et al., 15 Jan 2025).
5. Challenges, Open Problems, and Future Research
Primary research opportunities and challenges include:
- Closed-Loop Orchestration: Next-generation frameworks must support multi-objective optimization—jointly adapting compute, spectrum, and energy allocation under complex, service-differentiated SLAs in highly dynamic contexts.
- Standardization: There is a need to define open interfaces, interoperable data models, and unified telemetry for multi-vendor AI-RAN deployments, ensuring plug-and-play composability of RAN and AI resources.
- Testbeds and Benchmarking: Large-scale, high-fidelity testbeds and robust benchmarking frameworks are required to validate architecture and orchestration strategies at scale, accounting for real network variance and workload diversity.
- Security and Privacy: Implementing isolation, trust, and data protection mechanisms is mandatory when RAN and AI workloads co-exist on the same physical infrastructure, particularly in multi-tenant and public edge environments.
- Distributed and Federated AI: Techniques for federated/distributed AI model training among globally distributed AI-RAN nodes (addressing privacy, synchronization, and cross-domain learning challenges) remain a key area for future development (Kundu et al., 15 Jan 2025).
6. Impact and Outlook
The AI-RAN paradigm represents a foundational shift in both communications and computing architectures for mobile and edge networks. By unifying telecommunication and AI workloads on a shared, elastically orchestrated infrastructure, AI-RAN achieves:
- Significant improvements in asset utilization (demonstrated 2× gains over siloed deployments)
- The ability to offer composable, low-latency AIaaS for diverse verticals
- Zero-touch, intent-driven automation and network self-management capabilities
- Enhanced spectral efficiency and service agility through AI-driven PHY/MAC optimizations
Ongoing research and real-world validation will be critical to realizing the full potential of AI-RAN as the blueprint for 6G and beyond, guiding the telecom sector toward compute-enhanced, AI-native infrastructure (Kundu et al., 15 Jan 2025).