Exploring and Evaluating Real-world CXL: Use Cases and System Adoption (2405.14209v1)
Abstract: Compute eXpress Link (CXL) is emerging as a promising memory interface technology. Because of the common unavailiability of CXL devices, the performance of the CXL memory is largely unknown. What are the use cases for the CXL memory? What are the impacts of the CXL memory on application performance? How to use the CXL memory in combination with existing memory components? In this work, we study the performance of three genuine CXL memory-expansion cards from different vendors. We characterize the basic performance of the CXL memory, study how HPC applications and LLMs can benefit from the CXL memory, and study the interplay between memory tiering and page interleaving. We also propose a novel data object-level interleaving policy to match the interleaving policy with memory access patterns. We reveal the challenges and opportunities of using the CXL memory.
- Reto Achermann and Ashish Panwar. 2019. Mitosis workload BTree. https://github.com/mitosis-project/mitosis-workload-btree.
- Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-transparent Page Management for Two-tiered Main Memory. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (2017). https://api.semanticscholar.org/CorpusID:8753499
- Rethinking Serverless Computing: from the Programming Model to the Platform Design. (2023).
- Andi Kleen (SUSE Labs). [n. d.]. NUMA Support for Linux. https://github.com/numactl/numactl.
- The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-18. University of California at Berkele.
- The GAP benchmark suite. arXiv preprint arXiv:1508.03619 (2015).
- Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms. IEEE Micro 43, 2 (2023), 30–38.
- A Case for CXL-Centric Server Processors. arXiv preprint arXiv:2305.05033 (2023).
- Dancing in the Dark: Profiling for Tiered Memory. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2021), 13–22. https://api.semanticscholar.org/CorpusID:232134240
- J. Corbet. [n. d.]. AutoNUMA: the Other Approach to NUMA Scheduling. http://lwn.net/Articles/488709.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Data tiering in heterogeneous memory systems. Proceedings of the Eleventh European Conference on Computer Systems (2016). https://api.semanticscholar.org/CorpusID:8681081
- Memory pooling with cxl. IEEE Micro 43, 2 (2023), 48–57.
- AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (2020). https://api.semanticscholar.org/CorpusID:212641763
- Intel Corporation. 2019. Intel Memory Latency Checker v3.5. https://software.intel.com/en-us/articles/intelr-memory-latency-checker
- CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 585–600. https://www.usenix.org/conference/atc23/presentation/jang
- HeteroOS — OS design for heterogeneous memory management in datacenter. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), 521–534. https://api.semanticscholar.org/CorpusID:19189083
- Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In USENIX Annual Technical Conference. https://api.semanticscholar.org/CorpusID:236992513
- SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander. IEEE Micro 43 (2023), 20–29. https://api.semanticscholar.org/CorpusID:256491961
- DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems. IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2020).
- Radiant: efficient page table management for tiered memory systems. Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management (2021). https://api.semanticscholar.org/CorpusID:235463147
- Astera Labs. [n. d.]. Breaking Through the Memory Wall. https://www.asteralabs.com/general/breaking-through-the-memory-wall/.
- Taehyung Lee and Young Ik Eom. 2022. Optimizing the Page Hotness Measurement with Re-Fault Latency for Tiered Memory Systems. 2022 IEEE International Conference on Big Data and Smart Computing (BigComp) (2022), 275–279. https://api.semanticscholar.org/CorpusID:247618508
- MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination. In Proceedings of the 29th Symposium on Operating Systems Principles.
- Alberto Lerner and Gustavo Alonso. 2024. CXL and the Return of Scale-Up Database Engines. arXiv preprint arXiv:2401.01150 (2024).
- A Case Against CXL Memory Pooling. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks. 18–24.
- Pond: CXL-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 574–587.
- Yuze Li and Shunyu Yao. 2023. Understanding and Optimizing Serverless Workloads in CXL-Enabled Tiered Memory. arXiv preprint arXiv:2309.01736 (2023).
- Zhe Li and Mingyu Wu. 2022. Transparent and lightweight object placement for managed workloads atop hybrid memories. Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (2022). https://api.semanticscholar.org/CorpusID:247108266
- The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures. Future Generation Computer Systems 125 (2021), 743–757. https://doi.org/10.1016/j.future.2021.07.021
- MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2022), 925–937. https://api.semanticscholar.org/CorpusID:248865268
- TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
- Introducing the graph 500. Cray Users Group (CUG) 19, 45-74 (2010), 22.
- CXL Memory Expansion: A Closer Look on Actual Platform. https://www.micron.com/content/dam/micron/global/public/products/white-paper/cxl-memory-expansion-a-close-look-on-actual-platform.pdf.
- Jonathan Prout. [n. d.]. Expanding Beyond Limits With CXL-based Memory. https://memverge.com/wp-content/uploads/2022/08/CXL-Forum_Samsung.pdf.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (2021). https://api.semanticscholar.org/CorpusID:239029009
- Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy. In International Conference on Supercomputing (ICS).
- Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2021), 598–611. https://api.semanticscholar.org/CorpusID:231620477
- ZeRO-Offload: Democratizing Billion-Scale Model Training. In USENIX Annual Technical Conference.
- MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory. In Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys).
- A Case for Granularity Aware Page Migration. Proceedings of the 2018 International Conference on Supercomputing (2018). https://api.semanticscholar.org/CorpusID:52277753
- System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory. 2023 IEEE International Conference on Big Data and Smart Computing (BigComp) (2023), 9–12. https://api.semanticscholar.org/CorpusID:257644666
- Towards a fully disaggregated and programmable data center. In Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems. 18–28.
- An Introduction to the Compute Express Link (CXL) Interconnect. arXiv:2306.11227 [cs.AR]
- FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 1288, 23 pages.
- Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications. IEEE Computer Architecture Letters 22 (2023), 5–8. https://api.semanticscholar.org/CorpusID:254301130
- Lightweight Frequency-Based Tiering for CXL Memory Systems. arXiv preprint arXiv:2312.04789 (2023).
- Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices. In IEEE/ACM International Symposium on Microarchitecture.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- XSBench-the development and verification of a performance abstraction for Monte Carlo reactor analysis. The Role of Reactor Physics toward a Sustainable Future (PHYSOR) (2014).
- Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 18–32.
- Vishal Verma. 2022. Tiering-0.8. https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/log/?h=tiering-0.8.
- Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems. In 2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). IEEE, 11–20.
- Panthera: holistic memory management for big data processing over hybrid memories. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (2019). https://api.semanticscholar.org/CorpusID:150372592
- Characterizing and modeling non-volatile memory systems. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 496–508.
- TMO: transparent memory offloading in datacenters. Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2022). https://api.semanticscholar.org/CorpusID:247026540
- Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In International Conference for High Performance Computing, Networking, Storage and Analysis.
- Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Programs. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
- Characterizing the performance of intel optane persistent memory: a close look at its on-dimm buffering. In Proceedings of the Seventeenth European Conference on Computer Systems. 488–505.
- Nimble Page Management for Tiered Memory Systems. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (2019). https://api.semanticscholar.org/CorpusID:102348046
- An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 169–182.
- Overcoming the Memory Wall with CXL-Enabled SSDs. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 601–617. https://www.usenix.org/conference/atc23/presentation/yang-shao-peng
- CXLMemSim: A pure software simulated CXL. mem for performance characterization. arXiv preprint arXiv:2303.06153 (2023).
- Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory. In Proceedings of the 29th Symposium on Operating Systems Principles. 658–674.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).
- Jie Liu (492 papers)
- Xi Wang (275 papers)
- Jianbo Wu (8 papers)
- Shuangyan Yang (3 papers)
- Jie Ren (329 papers)
- Bhanu Shankar (1 paper)
- Dong Li (429 papers)