ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model (2408.05614v1)
Abstract: Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as substantial cache miss penalties, sub-optimal caching due to data access granularity mismatch between the DRAM "cache" and SSD "memory", and inefficient hardware cache management. To address these issues, we propose a novel solution, named ICGMM, which optimizes caching and eviction directly on hardware, employing a Gaussian Mixture Model (GMM)-based approach. We prototype our solution on an FPGA board, which demonstrates a noteworthy improvement compared to the classic Least Recently Used (LRU) cache strategy. We observe a decrease in the cache miss rate ranging from 0.32% to 6.14%, leading to a substantial 16.23% to 39.14% reduction in the average SSD access latency. Furthermore, when compared to the state-of-the-art Long Short-Term Memory (LSTM)-based cache policies, our GMM algorithm on FPGA showcases an impressive latency reduction of over 10,000 times. Remarkably, this is achieved while demanding much fewer hardware resources.
- Amir Gholami et al. Ai and memory wall. RiseLab Medium Post, 2021.
- Using deepspeed and megatron to train megatron-turing nlg 530b, the world’s largest and most powerful generative language model. NVIDIA Developer Blog, 2021.
- Minxue Tang et al. Fedcor: Correlation-based active client selection strategy for heterogeneous federated learning. In CVPR, 2022.
- Yitu Wang et al. Rerec: In-reram acceleration with access-aware mapping for personalized recommendation. In ICCAD, 2021.
- Compute Express Link (CXL). https://www.computeexpresslink.org/, 2022. Accessed: 2023-11-14.
- Debendra Das Sharma. Compute express link (cxl): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. MICRO, 2022.
- Huaicheng Li et al. Pond: Cxl-based memory pooling systems for cloud platforms. In ASPLOS, 2023.
- Yitu Wang et al. Ems-i: An efficient memory system design with specialized caching mechanism for recommendation inference. ACM Transactions on Embedded Computing Systems, 2023.
- Shiyu Li et al. Ndrec: A near-data processing system for training large-scale recommendation models. IEEE Transactions on Computers, pages 1–14, 2024.
- Shao-Peng Yang et al. Overcoming the memory wall with {{\{{CXL-Enabled}}\}}{{\{{SSDs}}\}}. In USENIX ATC, 2023.
- Operating systems: Three easy pieces. 2018.
- Algorithms and data structures for flash memories. CSUR, 2005.
- Chen Zhong et al. A deep reinforcement learning-based framework for content caching. In CISS, 2018.
- Zhan Shi et al. Applying deep learning to the cache replacement problem. In MICRO, 2019.
- Evan Liu et al. An imitation learning approach for cache replacement. In ICML, 2020.
- Juncheng Yang et al. {{\{{GL-Cache}}\}}: Group-level learning for efficient and high-performance caching. In FAST, 2023.
- Udit Gupta et al. The architectural implications of facebook’s dnn-based personalized recommendation. CoRR, 2019.
- Christian Bienia et al. The parsec benchmark suite: Characterization and architectural implications. In PACT, 2008.
- https://github.com/akopytov/sysbench, 2023. Accessed: 2023-11-15.
- Giuseppe Vietri et al. Driving cache replacement with {{\{{ML-based}}\}}{{\{{LeCaR}}\}}. In HotStorage, 2018.
- Douglas A Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 2009.
- Joo Hwan Lee et al. Smartssd: Fpga accelerated near-storage data analytics on ssd. 2020.
- John McCalpin. Stream: Sustainable memory bandwidth in high performance computers. http://www. cs. virginia. edu/stream/, 2006.
- memtier_benchmark, 2023. Accessed: 2023-11-15.