My CXL Pool Obviates Your PCIe Switch (2503.23611v3)

Published 30 Mar 2025 in cs.OS

Abstract: Pooling PCIe devices across multiple hosts offers a promising solution to mitigate stranded I/O resources, enhance device utilization, address device failures, and reduce total cost of ownership. The only viable option today are PCIe switches, which decouple PCIe devices from hosts by connecting them through a hardware switch. However, the high cost and limited flexibility of PCIe switches hinder their widespread adoption beyond specialized datacenter use cases. This paper argues that PCIe device pooling can be effectively implemented in software using CXL memory pools. CXL memory pools improve memory utilization and already have positive return on investment. We find that, once CXL pools are in place, they can serve as a building block for pooling any kind of PCIe device. We demonstrate that PCIe devices can directly use CXL memory as I/O buffers without device modifications, which enables routing PCIe traffic through CXL pool memory. This software-based approach is deployable on today's hardware and is more flexible than hardware PCIe switches. In particular, we explore how disaggregating devices such as NICs can transform datacenter infrastructure.

Summary

The paper demonstrates a novel approach that uses CXL memory pools instead of PCIe switches to enhance resource utilization and lower costs.
The methodology leverages shared CXL memory with software-managed cache coherence to maintain sub-microsecond latency and dynamic load balancing.
Deploying CXL pooling mitigates stranded I/O resources and simplifies orchestration by eliminating single points of failure in datacenters.

My CXL Pool Obviates Your PCIe Switch

Introduction

This paper discusses an alternative approach to PCIe device pooling using Compute Express Link (CXL) memory pools, rather than relying on costly and inflexible PCIe switches. By leveraging CXL, a shared memory pool can be realized that offers significant improvements in device utilization, reduces stranded I/O resources, mitigates device failure impacts, and lowers overall datacenter costs.

Figure 1: A CXL memory pool enables hosts to access any PCIe device in a CXL pod, forming a logical pool of PCIe devices.

Need for PCIe Pooling

PCIe pooling aims to enhance the utilization of existing resources and improve fault tolerance. Datacenters face challenges such as resource over-provisioning for peak demands, resulting in low utilization of NICs and SSDs. By pooling PCIe resources across multiple hosts, datacenters can potentially decrease resource wastage significantly. In addition, PCIe pooling allows dynamic load balancing and failover capabilities, reducing the redundancy required for high availability and hence the total cost of ownership (TCO).

Figure 2: Percentages of stranded CPU cores, memory capacity, SSD storage, and NIC bandwidth in Microsoft Azure datacenters.

CXL as Enabler for PCIe Pooling

CXL memory pools are poised to transform datacenter resource management. They allow multiple hosts to access and share memory resources efficiently, with support from newly emerging CXL technologies. Modern CXL pods provide cost-effective solutions compared to traditional PCIe switches, thanks to switch-less pod designs, and flexible sharing of any PCIe device, including directly utilizing CXL memory as I/O buffers. This capability is particularly advantageous for data-intensive applications and supports seamless hardware evolution.

Software Design for PCIe Pooling

The proposed software design for pooling includes placing I/O buffers in CXL shared memory and ensuring software-managed cache coherence. By allowing PCIe devices to access CXL memory directly, high-speed data paths can be maintained. The design also implements a sub-microsecond latency communication channel across CXL hosts, aiding in efficient event signaling and management of device memory operations.

Figure 3: 75~B packets.

Deployment Strategy and Orchestration

The orchestration component controls device-to-host mapping and dynamically manages resource allocation. It enables load balancing among hosts and mitigates device overloads. This strategy enhances fault tolerance by reallocating failing device workloads to healthy devices, maximizing resource utilization at minimal additional cost.

Figure 4: Latency distribution of message passing.

Discussion

Beyond immediate pooling benefits, CXL adoption could reshape datacenter infrastructure. By leveraging high CXL pod reliability, operators could potentially eliminate single points of failures like top-of-rack switches, resulting in cost reductions. Additionally, soft accelerator disaggregation paired with PCIe pooling could address challenges with underutilized, yet essential, accelerators. This enables flexible load balancing for both compute and storage workloads, increasing efficiency and scalability across distributed systems.

Conclusion

The paper presents a comprehensive analysis of using CXL memory pools as an alternative to PCIe switches for effective and flexible pooling of PCIe devices. CXL technology offers significant opportunities to optimize resource utilization, reduce costs, and enhance the scalability and reliability of datacenter operations. As CXL technology matures, its integration into platform design holds promise for improved datacenter management and operational efficiency.