Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption (2405.14209v1)

Published 23 May 2024 in cs.PF and cs.AR

Abstract: Compute eXpress Link (CXL) is emerging as a promising memory interface technology. Because of the common unavailiability of CXL devices, the performance of the CXL memory is largely unknown. What are the use cases for the CXL memory? What are the impacts of the CXL memory on application performance? How to use the CXL memory in combination with existing memory components? In this work, we study the performance of three genuine CXL memory-expansion cards from different vendors. We characterize the basic performance of the CXL memory, study how HPC applications and LLMs can benefit from the CXL memory, and study the interplay between memory tiering and page interleaving. We also propose a novel data object-level interleaving policy to match the interleaving policy with memory access patterns. We reveal the challenges and opportunities of using the CXL memory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Reto Achermann and Ashish Panwar. 2019. Mitosis workload BTree. https://github.com/mitosis-project/mitosis-workload-btree.
  2. Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-transparent Page Management for Two-tiered Main Memory. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (2017). https://api.semanticscholar.org/CorpusID:8753499
  3. Rethinking Serverless Computing: from the Programming Model to the Platform Design. (2023).
  4. Andi Kleen (SUSE Labs). [n. d.]. NUMA Support for Linux. https://github.com/numactl/numactl.
  5. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-18. University of California at Berkele.
  6. The GAP benchmark suite. arXiv preprint arXiv:1508.03619 (2015).
  7. Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms. IEEE Micro 43, 2 (2023), 30–38.
  8. A Case for CXL-Centric Server Processors. arXiv preprint arXiv:2305.05033 (2023).
  9. Dancing in the Dark: Profiling for Tiered Memory. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2021), 13–22. https://api.semanticscholar.org/CorpusID:232134240
  10. J. Corbet. [n. d.]. AutoNUMA: the Other Approach to NUMA Scheduling. http://lwn.net/Articles/488709.
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  12. Data tiering in heterogeneous memory systems. Proceedings of the Eleventh European Conference on Computer Systems (2016). https://api.semanticscholar.org/CorpusID:8681081
  13. Memory pooling with cxl. IEEE Micro 43, 2 (2023), 48–57.
  14. AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (2020). https://api.semanticscholar.org/CorpusID:212641763
  15. Intel Corporation. 2019. Intel Memory Latency Checker v3.5. https://software.intel.com/en-us/articles/intelr-memory-latency-checker
  16. CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 585–600. https://www.usenix.org/conference/atc23/presentation/jang
  17. HeteroOS — OS design for heterogeneous memory management in datacenter. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), 521–534. https://api.semanticscholar.org/CorpusID:19189083
  18. Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In USENIX Annual Technical Conference. https://api.semanticscholar.org/CorpusID:236992513
  19. SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander. IEEE Micro 43 (2023), 20–29. https://api.semanticscholar.org/CorpusID:256491961
  20. DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems. IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2020).
  21. Radiant: efficient page table management for tiered memory systems. Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management (2021). https://api.semanticscholar.org/CorpusID:235463147
  22. Astera Labs. [n. d.]. Breaking Through the Memory Wall. https://www.asteralabs.com/general/breaking-through-the-memory-wall/.
  23. Taehyung Lee and Young Ik Eom. 2022. Optimizing the Page Hotness Measurement with Re-Fault Latency for Tiered Memory Systems. 2022 IEEE International Conference on Big Data and Smart Computing (BigComp) (2022), 275–279. https://api.semanticscholar.org/CorpusID:247618508
  24. MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination. In Proceedings of the 29th Symposium on Operating Systems Principles.
  25. Alberto Lerner and Gustavo Alonso. 2024. CXL and the Return of Scale-Up Database Engines. arXiv preprint arXiv:2401.01150 (2024).
  26. A Case Against CXL Memory Pooling. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks. 18–24.
  27. Pond: CXL-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 574–587.
  28. Yuze Li and Shunyu Yao. 2023. Understanding and Optimizing Serverless Workloads in CXL-Enabled Tiered Memory. arXiv preprint arXiv:2309.01736 (2023).
  29. Zhe Li and Mingyu Wu. 2022. Transparent and lightweight object placement for managed workloads atop hybrid memories. Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (2022). https://api.semanticscholar.org/CorpusID:247108266
  30. The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures. Future Generation Computer Systems 125 (2021), 743–757. https://doi.org/10.1016/j.future.2021.07.021
  31. MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2022), 925–937. https://api.semanticscholar.org/CorpusID:248865268
  32. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
  33. Introducing the graph 500. Cray Users Group (CUG) 19, 45-74 (2010), 22.
  34. CXL Memory Expansion: A Closer Look on Actual Platform. https://www.micron.com/content/dam/micron/global/public/products/white-paper/cxl-memory-expansion-a-close-look-on-actual-platform.pdf.
  35. Jonathan Prout. [n. d.]. Expanding Beyond Limits With CXL-based Memory. https://memverge.com/wp-content/uploads/2022/08/CXL-Forum_Samsung.pdf.
  36. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  37. HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (2021). https://api.semanticscholar.org/CorpusID:239029009
  38. Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy. In International Conference on Supercomputing (ICS).
  39. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2021), 598–611. https://api.semanticscholar.org/CorpusID:231620477
  40. ZeRO-Offload: Democratizing Billion-Scale Model Training. In USENIX Annual Technical Conference.
  41. MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory. In Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys).
  42. A Case for Granularity Aware Page Migration. Proceedings of the 2018 International Conference on Supercomputing (2018). https://api.semanticscholar.org/CorpusID:52277753
  43. System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory. 2023 IEEE International Conference on Big Data and Smart Computing (BigComp) (2023), 9–12. https://api.semanticscholar.org/CorpusID:257644666
  44. Towards a fully disaggregated and programmable data center. In Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems. 18–28.
  45. An Introduction to the Compute Express Link (CXL) Interconnect. arXiv:2306.11227 [cs.AR]
  46. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 1288, 23 pages.
  47. Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications. IEEE Computer Architecture Letters 22 (2023), 5–8. https://api.semanticscholar.org/CorpusID:254301130
  48. Lightweight Frequency-Based Tiering for CXL Memory Systems. arXiv preprint arXiv:2312.04789 (2023).
  49. Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices. In IEEE/ACM International Symposium on Microarchitecture.
  50. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  51. XSBench-the development and verification of a performance abstraction for Monte Carlo reactor analysis. The Role of Reactor Physics toward a Sustainable Future (PHYSOR) (2014).
  52. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 18–32.
  53. Vishal Verma. 2022. Tiering-0.8. https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/log/?h=tiering-0.8.
  54. Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems. In 2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). IEEE, 11–20.
  55. Panthera: holistic memory management for big data processing over hybrid memories. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (2019). https://api.semanticscholar.org/CorpusID:150372592
  56. Characterizing and modeling non-volatile memory systems. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 496–508.
  57. TMO: transparent memory offloading in datacenters. Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2022). https://api.semanticscholar.org/CorpusID:247026540
  58. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In International Conference for High Performance Computing, Networking, Storage and Analysis.
  59. Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Programs. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
  60. Characterizing the performance of intel optane persistent memory: a close look at its on-dimm buffering. In Proceedings of the Seventeenth European Conference on Computer Systems. 488–505.
  61. Nimble Page Management for Tiered Memory Systems. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (2019). https://api.semanticscholar.org/CorpusID:102348046
  62. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 169–182.
  63. Overcoming the Memory Wall with CXL-Enabled SSDs. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 601–617. https://www.usenix.org/conference/atc23/presentation/yang-shao-peng
  64. CXLMemSim: A pure software simulated CXL. mem for performance characterization. arXiv preprint arXiv:2303.06153 (2023).
  65. Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory. In Proceedings of the 29th Symposium on Operating Systems Principles. 658–674.
  66. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jie Liu (492 papers)
  2. Xi Wang (275 papers)
  3. Jianbo Wu (8 papers)
  4. Shuangyan Yang (3 papers)
  5. Jie Ren (329 papers)
  6. Bhanu Shankar (1 paper)
  7. Dong Li (429 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.