AI-RAN: Converged AI and Wireless Networks

Updated 29 March 2026

AI-RAN is a converged infrastructure that integrates traditional radio access protocols with AI model training and inference on unified, accelerated hardware.
It leverages a software-defined, cloud-native design with GPU slicing and joint compute–communication orchestration to dynamically optimize resource allocation.
Empirical evaluations demonstrate enhanced asset utilization and performance, achieving up to 76% peak GPU usage while meeting stringent RAN SLA requirements.

Artificial-Intelligence-Radio Access Network (AI-RAN) refers to the evolutionary fusion of radio access network (RAN) infrastructure with AI at architectural, functional, and operational levels. Unlike traditional RANs, which are communications-centric, AI-RAN embodies a converged compute–communication fabric that natively co-hosts both RAN protocol functions (e.g., baseband, MAC, PHY) and diverse AI workloads on unified, accelerated hardware (CPUs, GPUs, DPUs, SSDs). AI-RAN is motivated by the need for ultra-low-latency, high-throughput, and adaptive service management required in emerging 6G networks, combined with the imperative to maximize asset utilization through dynamic, context-aware resource pooling and management (Kundu et al., 15 Jan 2025).

1. Evolution, Conceptual Foundations, and Paradigms

RAN architectures have transitioned from tightly-coupled, communication-centric designs to disaggregated, cloud-native, and programmable platforms. AI-RAN represents a pivotal step in this evolution, collapsing the silo between connectivity and edge/cloud computing within a single infrastructure. It leverages edge data centers—often located in central offices, mobile switches, or cell sites—to simultaneously execute RAN signal processing, AI model training, and inference tasks, exploiting dynamic pooling of computing and networking resources (Kundu et al., 15 Jan 2025).

AI-RAN is realized via three complementary paradigms:

AI-for-RAN: Integration of AI to optimize RAN protocol layers and orchestration, including PHY/MAC functions such as beamforming, channel estimation, interference mitigation, adaptive modulation/coding, scheduling, and zero-touch intent-based network management (O-RAN RIC, 3GPP AI enhancements).
AI-on-RAN: Hosting of AI-driven vertical applications (e.g., customer-care chatbots employing telco-specific LLMs, video analytics for smart factories, XR services) directly on the RAN’s compute substrate, supporting novel AIaaS bundles that interleave connectivity and AI.
AI-and-RAN: Joint, dynamic sharing of hardware resources across both RAN network functions (vCU, vDU, RU) and AI applications. Enablers include GPU-based orchestration frameworks (Aarna AMCOP, SoftBank AITRAS) and Multi-Instance GPU (MIG) slicing, which allow subdivision of physical GPUs into isolated environments for concurrent workloads (Kundu et al., 15 Jan 2025).

This multipronged model positions the RAN not only as a communication backbone but as a fully integrated digital platform for distributed intelligence.

2. Key Requirements, Enablers, and Reference Architecture

To operationalize AI-RAN, several hardware, software, and networking enablers are necessary:

Accelerated, Software-Defined Infrastructure: Ubiquitous GPUs (for both AI and parallel baseband tasks), high-speed interconnects (NVLink, PCIe Gen5/6), support for offline AI model training (cloud) and real-time inference (edge nodes).
Software-Defined, Cloud-Native Design: Decomposition of RAN network functions into Kubernetes-managed, containerized microservices facilitates elasticity, CI/CD updating, and high scalability.
Joint Compute–Communication Orchestration: Unified orchestrators provision compute (GPU/CPU/memory) and network resources simultaneously, embedding AI in the control loops that adaptively determine resource placement, scaling, and scheduling to meet latency, reliability, and SLA requirements. MIG slicing is a critical technique for extracting maximal utility from each GPU.
Native AI Embedding, Digital Twins: Machine-learned functions permeate all protocol layers, with real-time data pipelines feeding learning engines. Network digital twins (NDTs) provide safe sandboxes for offline training, validation, and scenario “what-if” testing prior to live deployment.

The reference architecture specifies a two-layer spine–leaf datacenter fabric, where each rack contains AI-RAN servers with CPUs, sliced GPUs, DPUs, and SSDs, interconnected by fronthaul and server leaf switches. These support flexible placement of vDU, vCU, CN, and AI microservices (e.g., NVIDIA Inference Microservice) within a Kubernetes cluster. The software stack includes cloud OS, RAN SMO, AI cluster agents and SDKs, microservices, and an end-to-end orchestrator for resource scheduling (Kundu et al., 15 Jan 2025).

3. Workload Concurrency, Multi-Tenancy, and Empirical Evaluation

AI-RAN’s practical feasibility is substantiated by proof-of-concept deployments on NVIDIA Grace-Hopper GH200 servers. Here, MIG slices two physical GH200 GPUs into multiple isolated instances, each dedicated to either RAN baseband processing (e.g., 5G gNB with 4T4R, 100 MHz, 30 kHz SCS) or AI applications (e.g., on-edge LLMs for digital humans). Under combined AI+RAN conditions:

RAN-only GPU utilization peaks at ~40% (large headroom).
AI-only instances may reach up to 60% spike utilization during demand peaks.
Combined AI+RAN workloads achieve a mean GPU utilization of approximately 41% and a peak of 76%, demonstrating near-optimal hardware occupation and a twofold increase in asset efficiency.

MIG-based orchestration ensures carrier-grade RAN SLAs are preserved while concurrently delivering dynamic AI inference workloads, a result critical for future multi-tenant, service-dense edge deployments (Kundu et al., 15 Jan 2025).

4. Performance Metrics, Trade-Offs, and Design Principles

AI-RAN design relies on formal performance modeling to guide resource allocation:

Metric	Formula	Interpretation
Throughput (T)	$T = \frac{\text{Processed bits}}{\text{Time}}$	Aggregate service rate
Latency (L)	$L = L_{\mathrm{compute}} + L_{\mathrm{comm}}$	End-to-end delay
Resource Utilization (U)	$U = \frac{\text{Active cycles}}{\text{Total cycles}}$	Efficiency of hardware pooling

These metrics inform compute and network provisioning strategies for mixed, often bursty, RAN and AI workloads under tight SLA constraints. Design trade-offs must balance compute oversubscription (to guarantee RAN SLAs) against AI inference burst accommodation and overall infrastructure ROI (Kundu et al., 15 Jan 2025).

5. Open Standardization, Interoperability, and Ongoing Challenges

AI-RAN deployment at scale requires:

Standardized Open Interfaces: Modular, vendor-neutral APIs for control, telemetry, and orchestration (e.g., O-RAN A1/E2, Kubernetes CRDs) are essential for multi-vendor environments and agile ecosystem evolution.
Multi-Vendor Interoperability: Data model and telemetry standardization is critical to prevent lock-in and facilitate best-of-breed orchestration solutions.
Security and Isolation Mechanisms: The co-hosting of AI and RAN functions on shared hardware introduces new trust, privacy, and isolation challenges. These demand robust policy, hardware/VM separation, and auditability mechanisms.
Federated and Distributed AI Training: Efficient, scalable methods for distributed model training/serving across geographically-dispersed nodes remain a key open problem.

The architecture must also accommodate future advances in network digital twins, intent-driven orchestration, automatic verification of AI agents in critical control loops, and compositional system-wide guarantees (Kundu et al., 15 Jan 2025).

6. Future Research Trajectories

Key research directions for AI-RAN include:

Advanced closed-loop orchestration solutions that co-optimize across compute, radio spectrum, and energy domains under tight, multi-objective SLA constraints.
Comprehensive, large-scale testbeds and benchmarks to empirically validate system behavior, quantify trade-offs, and establish robust best practices.
Cross-organizational standardization efforts focused on open interface definitions, multi-vendor data/telemetry models, and secure federated AI mechanisms.
Furthering automation towards zero-touch, self-healing networks, and exploring formal mechanisms for AI safety, privacy-preservation, and lifecycle management in a multi-tenant edge environment (Kundu et al., 15 Jan 2025).

AI-RAN thus constitutes both a design and operational paradigm—unifying high-performance, software-defined RANs and distributed AI workloads under a single converged infrastructure paradigm, laying the foundation for next-generation, compute-enhanced wireless networks.

Markdown Report Issue Upgrade to Chat

References (1)

AI-RAN: Transforming RAN with AI-driven Computing Infrastructure (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AI-RAN.