Disaggregated Architectures
- Disaggregated architectures are a design approach that separates compute, memory, storage, accelerators, and networking into independent resource pools connected by low-latency, high-bandwidth fabrics.
- They enable fine-grained resource allocation and independent scaling, addressing inefficiencies of monolithic systems to improve utilization and cost efficiency.
- These architectures pose challenges in scheduling and orchestration while driving innovations in interconnect technologies, fault tolerance, and dynamic resource management.
Disaggregated architectures are a class of computer and data center designs in which major resources—compute, memory, storage, accelerators, and networking—are physically separated into independently managed pools and interconnected via high-bandwidth, low-latency fabrics. Rather than binding resources within the server motherboard, disaggregation “unlocks” fine-grained resource allocation, enables independent scaling of hardware components, and facilitates heterogeneous ecosystem integration. This approach aims to address inefficiencies in monolithic systems, such as resource stranding, scaling bottlenecks, upgrade inflexibility, and coarse failure granularity (Guo et al., 6 Nov 2025, Ewais et al., 20 Feb 2024, Vega et al., 2017).
1. Architectural Foundations and System Organization
Disaggregated architectures fundamentally depart from the server-centric model. Resources—CPU (“compute bricks”), DRAM (“memory bricks” or blades), storage devices, and accelerators (GPUs, DPUs, FPGAs)—are detached from their original chassis and accessed via a composable interconnect. Prevailing interconnect technologies include PCIe/CXL for cache-coherent memory sharing (with 170–250 ns RTT), Gen-Z for memory-semantic transport, InfiniBand/RoCE for microsecond-scale remote DRAM/SSD access, and increasingly, silicon-photonics for Tb/s cross-rack links (Guo et al., 6 Nov 2025, Ewais et al., 20 Feb 2024, Ekane et al., 2021).
Architectural models vary from rack-scale aggregation (disaggregated memory pools managed by per-rack ToR switches and Remote Memory Access Controllers (Puri et al., 2023, Lee et al., 2021)) to composable cross-rack fabrics and management layers (e.g., Flow-in-Cloud for multi-node accelerator sharing (Takano et al., 2020)). Designs feature centralized controllers with global directories or distributed, in-network management, as in MIND’s switch-resident MSI coherence protocol (Lee et al., 2021).
Table: Representative Resource Pool Types in Disaggregation
| Pool Type | Component Example | Interconnect/Fabric |
|---|---|---|
| Compute | CPU blades | PCIe, CXL, Gen-Z, InfiniBand |
| Memory | DRAM blades/NVM | CXL, Gen-Z, InfiniBand |
| Storage | NVMe SSD/HDD | NVMe-OF, PCIe, RDMA |
| Accelerator | GPU/DPU/FPGAs | PCIe, RoCE, custom switch |
| Networking | SmartNIC/DPU boards | InfiniBand, silicon-photonics |
Pools are independently failure-isolated; e.g., memory node failures are recoverable without compute downtime, and resource upgrades affect only targeted hardware (Ke et al., 2022, Keeton et al., 2021).
2. Key Principles: Pooling, Composability, and Scheduling
Disaggregation enables unified resource pools where any workload can dynamically assemble a virtual machine or container with the precise mixture of CPU cores, memory capacity, storage, and accelerators required (Guo et al., 6 Nov 2025, Vega et al., 2017, Ke et al., 2022). Key scheduling principles include:
- Fine-Grained Pooling: Allocation at core/GB/unit granularity replaces whole-server provisioning.
- Hierarchical Resource Managers: Leaf/rack/cluster/central layers abstract capacity and enforce operational constraints (cooling, power, rack topology).
- Integer Linear Programming (ILP): Workload placement modeled as an ILP that minimizes cost and mismatch penalties subject to demand satisfaction and resource constraints:
- Affinity-Aware Scheduling: For distributed LLM serving, network-co-located prefill/decode pools and group scaling optimize resource locality and reduce KV-cache transfer bottlenecks (Li et al., 27 Aug 2025).
Disaggregation complicates dynamic orchestration, requiring new policies for slice construction (attach/devices, prepare/launch machine, detach, destroy) and greedy or bipartite matching algorithms for resource selection (Takano et al., 2020, Guo et al., 6 Nov 2025).
3. Hardware and Memory System Innovations
Disaggregated designs push hardware innovation for bandwidth, latency, and scalability. Examples include:
- Rack-Scale Memory Disaggregation: Compute nodes access remote DRAM “pools” via RMAC and high-speed switches (Puri et al., 2023). Remote access latency decomposes as:
Where is the sum of contention delays across network queues, a principal source of tail latency—requiring careful pool-selection and page-allocation policies.
- Software-Defined Bus Bridges: Board-level bridges (ARM AXI4) enable hundreds of master/slave devices to communicate across boards/mainboards, programmable at runtime for orchestration-managed resource steering (Syrivelis et al., 2018).
- DaeMon Engines and Data Movement: Advanced hardware blocks provide dual-queue (cache-line + page) migration, adaptive bandwidth partitioning, link compression, and multi-granularity migration to alleviate remote memory overhead and ensure critical-path cache lines are prioritized over bulk pages. Evaluations show 2.39× speedup and 3.06× reduction in access costs compared to page-only migration (Giannoula et al., 2023, Giannoula et al., 2023).
- In-Network MMUs and Coherence: Switch-resident directory protocols (MIND) can realize directory-based MSI coherence at line rate with ≲2 MB of SRAM and ≈5K TCAM entries; remote loads achieve ≈9 μs, and full elasticity is supported (Lee et al., 2021). Contrastingly, traditional software-based DSM adds substantial overhead due to locking, lookup, and round-trip delays.
- SmartNIC/DPU-Aided Storage: DPUs with on-chip ARM cores offload stateless I/O paths, achieve 1.87× throughput gain and ≥10 core savings per server. Offload designs preserve end-to-end semantics by splitting TCP flows and minimize host intervention via DMA-based lock-free rings (Zhang et al., 18 Jul 2024).
4. Software, OS Abstractions, and Programmability
Disaggregated operating systems must expose resource architectures directly to applications, not abstract them away (Angel et al., 2019). Proposed interfaces include:
- Memory Grant and Steal APIs: Allow zero-copy transfer or reassignment of live pages between processes, cutting shuffle and recovery costs up to 20× for small RPCs; implemented via atomic V2P updates on the rack MMU (Angel et al., 2019).
- Failure Notification and Partial Recovery: First-class event-driven signals for blade failures facilitate rapid detection and reconfiguration in replicated or data-parallel applications.
- Split-Kernel and OS Disaggregation: Resource-centric OSes (LegoOS) disaggregate networking (nComponents) alongside compute (pComponents) and memory (mComponents), with remote stub–skeleton RPCs for system calls, and optimizations like dDMA and dDDIO that mimic classical local fast paths in the disaggregated environment (Ekane et al., 2021).
- Microservice Offload and Migration: Runtimes for serverless or microservices need orchestration logic, predictive cost models for offload/migration (e.g., ), and must address device abstraction, live-migration disruption, and QoS/SLO preservation (Lu et al., 2021).
- Cross-Layer Memory Programmability: Applications can exploit explicit RDMA verbs, object granularity prefetch primitives, and container/VM runtime extensions to leverage transparent far memory, heterogeneity, and adaptive scheduling (Wang et al., 26 Mar 2025).
5. Practical Impact, Evaluation, and Trade-Offs
Disaggregated architectures demonstrate:
- Utilization and Cost Efficiency: Per-blade resource utilization increases from legacy sub-20% CPU and ≈50% memory to 70–80% steady state; total cost of ownership (TCO) is reduced by up to 49.3% for recommendation serving (Ke et al., 2022), with additional 21–43.6% gains via near-memory processing hardware.
- Performance-Latency Characteristics: In dReDBox, remote memory incurs 66–80% throughput penalty (IPC-based), mitigated by parallelism and dynamic memory expansion; tail latency is minimized via pool selection and alternate allocation schemes (Vega et al., 2017, Puri et al., 2023).
- Reliability and Failure Domain Isolation: Segregating memory and compute failures allows over-provisioning factors to drop from 7% to 3.7%, directly lowering capex and opex (Ke et al., 2022, Keeton et al., 2021). MODC-style frameworks natively exploit partial failure recovery via lock-free scheduling and task replay, outperforming checkpoint-based approaches by up to 51% under failure scenarios.
- Scalability and Resource Elasticity: Designs such as MIND enable transparent scaling across blades/racks, while SWARM protocols bring wait-free, single-roundtrip replication and strong consistency to shared objects in disaggregated memory (Murat et al., 24 Sep 2024).
- Trade-Offs: Granularity of pooling versus network overhead, configuration complexity, and scheduling discipline affect latency, cost, cooling, and power management. Uniform pools with legacy mixing are empirically robust absent perfect forecasts (Guo et al., 6 Nov 2025).
Table: Performance Improvement Summary (Selected Benchmarks)
| System | Latency (/Mono) | Utilization (pp increase) | TCO Savings (%) |
|---|---|---|---|
| dReDBox (Vega et al., 2017) | 66–80% slower IPC | — | — |
| DaeMon (Giannoula et al., 2023) | 2.39× speedup | — | — |
| DisaggRec (Ke et al., 2022) | <5% latency overhead | up to +6.8 reliability | up to 49.3 |
| HeteroScale (Li et al., 27 Aug 2025) | — | +26.6 pp (GPU) | hundreds of K GPU-hours saved |
6. Challenges, Research Frontiers, and Future Directions
Emerging research focuses on:
- Programmable Interconnect Integration: CXL and P4-based fabrics are advancing multi-tier MMUs and line-rate cache-coherence.
- Cross-Layer Resource Management: Adaptive software policies for allocation, power and cooling orchestration, and tenant isolation (rate limiting, app-specific fabrics).
- Consistency and Replication Protocols: Wait-free, low-latency SWARM replication for disaggregated key-value stores.
- Microservice Mobility and Offloading: Unified ISA-abstractions for offload, ML-driven task-aware scheduling, and device-agnostic binaries.
- Security and Fault-Tolerance: New isolation domains, TEEs, and erasure coding for resilience.
Open challenges include formal verification of replication protocols under clock skew (Murat et al., 24 Sep 2024), atomic multi-object transactions across distributed blades, and unifying volatile and persistent memory in global namespaces (Ewais et al., 20 Feb 2024).
7. Significance and Ecosystem Reorientation
Disaggregated architectures represent a redesign of the entire data center ecosystem. Scheduling, application APIs, hardware configuration, cooling, power provisioning, and upgrading are all impacted. Co-design across layers is essential: resource orchestration receives real-time constraints from physical infrastructure, and distributed OSes must orchestrate composable pools, failure recovery, and security (Guo et al., 6 Nov 2025, Ewais et al., 20 Feb 2024). As large-scale foundation models, cloud-native workloads, and edge/micro-data centers proliferate, disaggregation will be central to next-generation datacenter architectures.
Disaggregated architectures thus enable flexible, scalable, heterogeneous, and resource-efficient computing—at the cost of greater scheduling, coordination, and hardware complexity. The transition from rigidity to composability offers profound benefits but depends critically on cross-layer system co-design, adaptive hardware innovations, and new programming abstractions.