CXLMemSim: A pure software simulated CXL.mem for performance characterization (2303.06153v2)
Abstract: CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory stranding (underutilized memory trapped on fully loaded servers) in cloud and datacenter environments. However, CXL-attached memory introduces additional latency and bandwidth constraints compared to local DRAM, and real CXL .mem hardware is not yet widely available for empirical evaluation. CXLMemSim addresses this gap by attaching to unmodified applications and simulating CXL-based memory pools in software. It operates by tracing memory allocations and accesses using efficient kernel probes and hardware performance counters, dividing execution into epochs, and injecting timing delays to emulate various CXL .mem latency/bandwidth characteristics. This approach incurs modest runtime overhead while preserving realistic load/store memory access patterns. We implement CXLMemSim on commodity hardware without special devices, and our evaluation shows that it runs orders of magnitude faster than cycle-accurate simulators (e.g., Gem5) for real-world workloads, while accurately modeling the performance impact of CXL .mem. We demonstrate use cases where CXLMemSim enables experimentation with memory pooling configurations, scheduling policies, data migration strategies, and caching techniques that were previously infeasible to evaluate at scale. Key findings include the viability of software-based CXL .mem emulation with low overhead, insights into latency and congestion effects in memory pools, and guidance for system designers to optimize memory disaggregation. Overall, CXLMemSim provides a practical and extensible platform for researchers and practitioners to explore CXL.mem innovations before real hardware becomes commonplace.