Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model (2408.05614v1)

Published 10 Aug 2024 in cs.AR, cs.ET, cs.SY, and eess.SY

Abstract: Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as substantial cache miss penalties, sub-optimal caching due to data access granularity mismatch between the DRAM "cache" and SSD "memory", and inefficient hardware cache management. To address these issues, we propose a novel solution, named ICGMM, which optimizes caching and eviction directly on hardware, employing a Gaussian Mixture Model (GMM)-based approach. We prototype our solution on an FPGA board, which demonstrates a noteworthy improvement compared to the classic Least Recently Used (LRU) cache strategy. We observe a decrease in the cache miss rate ranging from 0.32% to 6.14%, leading to a substantial 16.23% to 39.14% reduction in the average SSD access latency. Furthermore, when compared to the state-of-the-art Long Short-Term Memory (LSTM)-based cache policies, our GMM algorithm on FPGA showcases an impressive latency reduction of over 10,000 times. Remarkably, this is achieved while demanding much fewer hardware resources.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Amir Gholami et al. Ai and memory wall. RiseLab Medium Post, 2021.
  2. Using deepspeed and megatron to train megatron-turing nlg 530b, the world’s largest and most powerful generative language model. NVIDIA Developer Blog, 2021.
  3. Minxue Tang et al. Fedcor: Correlation-based active client selection strategy for heterogeneous federated learning. In CVPR, 2022.
  4. Yitu Wang et al. Rerec: In-reram acceleration with access-aware mapping for personalized recommendation. In ICCAD, 2021.
  5. Compute Express Link (CXL). https://www.computeexpresslink.org/, 2022. Accessed: 2023-11-14.
  6. Debendra Das Sharma. Compute express link (cxl): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. MICRO, 2022.
  7. Huaicheng Li et al. Pond: Cxl-based memory pooling systems for cloud platforms. In ASPLOS, 2023.
  8. Yitu Wang et al. Ems-i: An efficient memory system design with specialized caching mechanism for recommendation inference. ACM Transactions on Embedded Computing Systems, 2023.
  9. Shiyu Li et al. Ndrec: A near-data processing system for training large-scale recommendation models. IEEE Transactions on Computers, pages 1–14, 2024.
  10. Shao-Peng Yang et al. Overcoming the memory wall with {{\{{CXL-Enabled}}\}}{{\{{SSDs}}\}}. In USENIX ATC, 2023.
  11. Operating systems: Three easy pieces. 2018.
  12. Algorithms and data structures for flash memories. CSUR, 2005.
  13. Chen Zhong et al. A deep reinforcement learning-based framework for content caching. In CISS, 2018.
  14. Zhan Shi et al. Applying deep learning to the cache replacement problem. In MICRO, 2019.
  15. Evan Liu et al. An imitation learning approach for cache replacement. In ICML, 2020.
  16. Juncheng Yang et al. {{\{{GL-Cache}}\}}: Group-level learning for efficient and high-performance caching. In FAST, 2023.
  17. Udit Gupta et al. The architectural implications of facebook’s dnn-based personalized recommendation. CoRR, 2019.
  18. Christian Bienia et al. The parsec benchmark suite: Characterization and architectural implications. In PACT, 2008.
  19. https://github.com/akopytov/sysbench, 2023. Accessed: 2023-11-15.
  20. Giuseppe Vietri et al. Driving cache replacement with {{\{{ML-based}}\}}{{\{{LeCaR}}\}}. In HotStorage, 2018.
  21. Douglas A Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 2009.
  22. Joo Hwan Lee et al. Smartssd: Fpga accelerated near-storage data analytics on ssd. 2020.
  23. John McCalpin. Stream: Sustainable memory bandwidth in high performance computers. http://www. cs. virginia. edu/stream/, 2006.
  24. memtier_benchmark, 2023. Accessed: 2023-11-15.
Citations (1)

Summary

We haven't generated a summary for this paper yet.