Disaggregated Infrastructure Fundamentals

Updated 15 November 2025

Disaggregated infrastructure is an architectural paradigm that divides datacenter resources into independent pools connected via high-bandwidth, low-latency fabrics.
It enables dynamic composition, fine-grained allocation, and cost-effective scaling by decoupling compute, memory, storage, and networking functions.
Research highlights include advances in resource scheduling with reinforcement learning, throughput improvements up to 6.7×, and significant reductions in data movement latency.

Disaggregated infrastructure is an architectural paradigm in which datacenter resources—compute, memory, storage, and network—are physically separated into independent resource pools ("blades," "bricks," or "sleds") and interconnected via low-latency, high-bandwidth fabrics such as RDMA-over-Ethernet, CXL, or optical switching. By decoupling resource types and enabling dynamic composition, disaggregation enables vertical elasticity, fine-grained allocation, avoidance of resource stranding, improved utilization, and flexible, cost-effective scaling and maintenance. This model contrasts sharply with the conventional monolithic server architecture, where rigid resource bundling leads to underutilization and costly overprovisioning.

1. Architectural Principles and Taxonomies

Disaggregated infrastructure reorganizes traditional servers into:

Compute pools (CPU or accelerator sleds): nodes primarily responsible for general or specialized computation, with minimal local memory for bootstrapping.
Memory pools (DRAM blades, HBM): accessible remotely over fabrics; can employ cache blades for intermediate tiers.
Storage pools (NVMe SSD/HDD shelves): exposed over block/file protocols, often via composable controllers.
Network components (RDMA, PCIe/CXL links, or photonic switches): deliver inter-node communication with sub-microsecond latency.
Accelerator pools (GPUs, FPGAs, DPUs, NPUs): abstracted as PCIe-attached or fabric-exposed endpoints.

Architectural taxonomies distinguish between "split" models (local+remote fractions per-server, software-driven) and "pool" models (resource pools accessible globally, hardware-enforced) (Ewais et al., 2024). Disaggregated designs typically favour a pool architecture for maximal flexibility.

Memory hierarchy within a pooled datacenter parallels on-chip designs, e.g.:

L1–L2: on-chip caches
L3: local DRAM
L4: shared cache blades
L5: remote DRAM blades
L6: NVM blades (Ewais et al., 2024)

2. Resource Management, Scheduling, and Allocation Algorithms

Resource allocation in disaggregated infrastructures is orchestrated by software controllers or increasingly, reinforcement learning-based resource managers (Shabka et al., 2021). Allocation must optimize for:

Utilization: $U = \frac{\text{used}}{\text{provisioned}}$ , driving towards 100% resource use for all classes.
Latency and Bandwidth Constraints: Remote resource access incurs additional latency (e.g., CXL.mem round-trips of 170–250 ns vs. 80–140 ns for local DDR) (Guo et al., 6 Nov 2025, Ewais et al., 2024). Effective bandwidth is weighted by hit rates, e.g. $B_{\rm eff} = h_{LRU} B_{\rm local} + (1-h_{LRU}) B_{\rm remote}$ .
Scheduling: Integer linear programming (ILP), greedy bin-packing, and locality-aware heuristics are employed to select resource slices for requests, balancing throughput against latency and power (Guo et al., 6 Nov 2025). RL-based approaches co-manage server and network pools, achieving up to 42.6% higher CPU utilization and matching baseline performance with 5.3× fewer network resources, even scaling to 100× larger topologies (Shabka et al., 2021).

Locality and NUMA-awareness in mapping are critical: pinning virtual cores and migration of VM memory can improve application performance up to two orders of magnitude by minimizing remote access hop counts and cache contention (Lakew et al., 2 Jan 2025).

3. Data Movement, Storage, and Networking in Disaggregation

Efficient data movement is essential. Architectural support includes:

Multi-granularity migration hardware (DaeMon): partitioning bandwidth between cache-line (sub-block) and page transfers, link compression, and adaptive selection mechanisms deliver 3.06× lower data-access costs and 2.4× higher IPC compared to page-only approaches (Giannoula et al., 2023).
Disaggregated storage with DPUs (DDS): storage servers offload read I/O to DPUs, eliminating host CPU involvement. DDS achieves order-of-magnitude latency reductions (p99 from 11 ms to 0.78 ms), 100% CPU core savings, and near-local NVMe throughput with minimal DBMS changes (Zhang et al., 2024).
Networking: resource boards abstracted as pComponent/mComponent/nComponent support split-kernel OSes, with kernel-bypass and DMA/DDIO optimizations. Disaggregated networking can deliver latencies as low as 12–20 μs for small messages, matching monolithic Linux TCP performance (Ekane et al., 2021).

For streaming, device-native data movement via CXL-attached DTUs bypasses CPU staging, reducing end-to-end latency by up to 67% and increasing throughput to 15 GiB/s (Asmussen et al., 2024).

4. Application Domains and Workload Characterization

Disaggregated infrastructure has demonstrated utility across domains:

LLM inference and serving: P/D-Serve and BanaServe decouple prefill (compute-bound) and decode (memory-bound) phases, leverage fine-grained KVCache disaggregation, dynamic module migration, and load-aware routing; throughput improvements up to 6.7×, TTFT-SLO success rate up to 0.99, and D2D transfer time reductions up to 46% over baselines are established (Jin et al., 2024, He et al., 15 Oct 2025).
DBMSs: Compute and storage are independently scaled. Disaggregated microservices facilitate SLO-driven elasticity. Systems such as Nova-LSM and RocksDB-Cloud show 10–20× throughput gains over legacy shared-nothing architectures (Ghandeharizadeh et al., 2024).
Large-model training/DL workloads: ASTRA-sim2.0 models hierarchical/disaggregated memory systems, embracing block-based topologies. Parameter sweeps quantify trade-offs in remote groupings, network hop count, and collective acceleration for ML (Won et al., 2023).
Fog and edge computing: Near-edge disaggregated servers can halve far-edge fog node requirements (up to 50% reduction), reduce active component count by 33–35%, and improve resource utilization, provided network fabric bottlenecks do not strand resources (Ajibola et al., 2019).
Network analytics: dReDBox (ARM sleds + DRAM blades + optical switch) enables dynamic memory attachment, accepting 66–80% overhead for the ability to support an order-of-magnitude memory swings, amortized by higher parallelism (Vega et al., 2017).
Key-Value stores on disaggregated-memory: SWARM-KV shows nearly raw RDMA performance (<27% overhead), strong consistency and wait-freedom via speculative Safe-Guess and In-n-Out protocols, with tail p99 latencies ≈10–30 µs under contention (Murat et al., 2024).

5. Performance, Utilization, and Trade-Offs

Disaggregated infrastructures exhibit quantifiable performance benefits and cost savings:

P/D-Serve: 6.7× throughput vs. monolithic; TTFT-SLO success rate from 0.58 to 0.99; D2D transfer time from 1.00 to 0.54 (normalized) (Jin et al., 2024)
DDS: 14× reduction in read I/O latency, total host CPU core savings up to 10.7 per server (Zhang et al., 2024)
RL-RDDC: up to 22.0% higher acceptance ratio, 42.6% CPU utilization gain, and scalable to order-of-magnitude larger topologies (Shabka et al., 2021)
dReDBox: 66–82% IPC overhead on remote accesses, offset by dynamic parallelism and reduced capex/opex (Vega et al., 2017)
NUMA mapping: speedup of 33–241× over vanilla Linux scheduling for in-memory databases and microservices (Lakew et al., 2 Jan 2025)

Trade-offs include network bottlenecks (limiting near-edge/fog deployment (Ajibola et al., 2019)), additional remote-hop latency, orchestration complexity, and occasional memory overheads (e.g., 2–3× for replication in SWARM-KV (Murat et al., 2024)).

6. Design Challenges, Limitations, and Open Research Directions

Major challenges include:

Latency and Coherence: Masking the added latency from remote memory, cache coherence across blades, directory sizing limits, and NUMA-awareness (Ewais et al., 2024, Guo et al., 6 Nov 2025)
Resource orchestration: Automated selection and composition, correctness/SLO guarantees under arbitrary microservice assemblies (Guo et al., 6 Nov 2025, Ghandeharizadeh et al., 2024)
Networking and data movement: Fabric design (bandwidth/delay trade-offs), protocol standardization (e.g., for device-native streaming (Asmussen et al., 2024)), multitenancy/isolation across shared fabrics (Zhang et al., 2024)
Consistency and Fault Tolerance: Single-RTT, strongly consistent and liveness-preserving protocols for replication (Safe-Guess, In-n-Out) (Murat et al., 2024)
Management and scheduling: Multi-objective (throughput/latency/cost/power), learning-driven policies (RL, GNN-based allocation) (Shabka et al., 2021)
Security: Dynamic least-privilege enclave composition across CPU/IO/accelerator, remote attestation, enclave-driven TCB management (Schneider et al., 2020)

Open research spans improvements in programmable fabrics and switches, persistence augmentation for volatile memory pools, deep learning-driven orchestration, transactional remote memory operations, and standardized APIs for both microservices and device-level streaming.

7. Future Prospects and Impacts

Disaggregated infrastructure is poised to become the substrate for both hyperscale cloud and edge computing. It enables unprecedented vertical and horizontal scaling, finer-grained utilization, and reduced hardware lifecycle costs. Pending advances in fabrics, orchestration, and protocols, this paradigm is expected to dominate the next decade’s evolution in data center and distributed systems architecture. Rigorous modeling, simulation, and deployment studies—as well as formal proofs (e.g., for consistency and liveness)—remain essential to ensure correctness, performance, and resilience at scale.