Taming Server Memory TCO with Multiple Software-Defined Compressed Tiers
Abstract: Memory accounts for 33 - 50% of the total cost of ownership (TCO) in modern data centers. We propose a novel solution to tame memory TCO through the novel creation and judicious management of multiple software-defined compressed memory tiers. As opposed to the state-of-the-art solutions that employ a 2-Tier solution, a single compressed tier along with DRAM, we define multiple compressed tiers implemented through a combination of different compression algorithms, memory allocators for compressed objects, and backing media to store compressed objects. These compressed memory tiers represent distinct points in the access latency, data compressibility, and unit memory usage cost spectrum, allowing rich and flexible trade-offs between memory TCO savings and application performance impact. A key advantage with ntier is that it enables aggressive memory TCO saving opportunities by placing warm data in low latency compressed tiers with a reasonable performance impact while simultaneously placing cold data in the best memory TCO saving tiers. We believe our work represents an important server system configuration and optimization capability to achieve the best SLA-aware performance per dollar for applications hosted in production data center environments. We present a comprehensive and rigorous analytical cost model for performance and TCO trade-off based on continuous monitoring of the application's data access profile. Guided by this model, our placement model takes informed actions to dynamically manage the placement and migration of application data across multiple software-defined compressed tiers. On real-world benchmarks, our solution increases memory TCO savings by 22% - 40% percentage points while maintaining performance parity or improves performance by 2% - 10% percentage points while maintaining memory TCO parity compared to state-of-the-art 2-Tier solutions.
- Buddy memory allocation - wikipedia. https://en.wikipedia.org/wiki/Buddy_memory_allocation. (Accessed on 08/10/2023).
- Decadal plan for semiconductors. https://www.src.org/about/decadal-plan/decadal-plan-full-report.pdf.
- Intel® optane™ dc persistent memory product brief. https://www.intel.in/content/dam/www/public/us/en/documents/product-briefs/optane-dc-persistent-memory-brief.pdf. (Accessed on 08/01/2023).
- jshun/ligra: Ligra: A lightweight graph processing framework for shared memory. https://github.com/jshun/ligra. (Accessed on 08/10/2023).
- Lempel–ziv–oberhumer - wikipedia. https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Oberhumer. (Accessed on 08/04/2023).
- lz4/lz4: Extremely fast compression algorithm. https://github.com/lz4/lz4. (Accessed on 08/04/2023).
- Lzo [lwn.net]. https://lwn.net/Articles/545878/. (Accessed on 08/04/2023).
- Memcached: Aws graviton2 benchmarking - infrastructure solutions blog - arm community blogs - arm community. https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/memcached-benchmarking-aws-graviton2-50-p-p-gains. (Accessed on 08/10/2023).
- New in memtier benchmark: Pseudo-random data, gaussian access pattern and range manipulation. https://redis.com/blog/new-in-memtier_benchmark-pseudo-random-data-gaussian-access-pattern-and-range-manipulation/.
- Perfmon events. https://perfmon-events.intel.com/. (Accessed on 08/01/2023).
- Silesia compression corpus. https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia. (Accessed on 08/06/2023).
- z3fold — the linux kernel documentation. https://www.kernel.org/doc/html/v5.8/vm/z3fold.html. (Accessed on 08/04/2023).
- Zram will see greater performance on linux 5.1 - it changed its default compressor - phoronix. https://www.phoronix.com/news/ZRAM-Linux-5.1-Better-Perform. (Accessed on 08/04/2023).
- zswap — the linux kernel documentation. https://www.kernel.org/doc/html/v5.8/vm/zswap.html?highlight=zbud. (Accessed on 08/04/2023).
- zswap — the linux kernel documentation. https://www.kernel.org/doc/html/latest/admin-guide/mm/zswap.html. (Accessed on 07/22/2023).
- memcached - a distributed memory object caching system. https://memcached.org/, 2019. (Accessed on 11/18/2019).
- Redis. https://redis.io/, 2019. (Accessed on 11/18/2019).
- Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17, page 631–644, New York, NY, USA, 2017. Association for Computing Machinery.
- Enabling cxl memory expansion for in-memory database management systems. In Data Management on New Hardware, DaMoN’22, New York, NY, USA, 2022. Association for Computing Machinery.
- Shoaib Akram. Performance evaluation of intel optane memory for managed workloads. ACM Trans. Archit. Code Optim., 18(3), apr 2021.
- Exploiting cxl-based memory for distributed deep learning. In Proceedings of the 51st International Conference on Parallel Processing, ICPP ’22, New York, NY, USA, 2023. Association for Computing Machinery.
- David Barina. Experimental lossless data compressor. Microprocessors and Microsystems, 98:104803, 2023.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Jonathan Corbet. The zsmalloc allocator [lwn.net]. https://lwn.net/Articles/477067/, 2012. (Accessed on 08/04/2023).
- CXL. Compute express link. https://www.computeexpresslink.org/, 2023.
- Towards an adaptable systems architecture for memory tiering at warehouse-scale. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 727–741, New York, NY, USA, 2023. Association for Computing Machinery.
- Samsung Electronics. Samsung electronics introduces industry’s first 512gb cxl memory module. https://semiconductor.samsung.com/newsroom/news/samsung-electronics-introduces-industrys-first-512gb-cxl-memory-module/, 2022.
- Memory pooling with cxl. IEEE Micro, 43(2):48–57, 2023.
- Heterovisor: Exploiting resource heterogeneity to enhance the elasticity of cloud platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’15, page 79–92, New York, NY, USA, 2015. Association for Computing Machinery.
- Christian Hansen. Linux idle page tracking, 2018.
- A fast implementation of deflate. In 2014 Data Compression Conference, pages 223–232, 2014.
- Intel. Pebs (processor event-based sampling) manual, 2023.
- Seth Jennings. The zswap compressed swap cache [lwn.net]. https://lwn.net/Articles/537422/, 2013. (Accessed on 08/04/2023).
- Exploring the design space of page management for Multi-Tiered memory systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 715–728. USENIX Association, July 2021.
- Radiant: Efficient page table management for tiered memory systems. In Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management, ISMM 2021, page 66–79, New York, NY, USA, 2021. Association for Computing Machinery.
- Software-defined far memory in warehouse-scale computers. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2019.
- Memtis: Efficient memory tiering with dynamic page classification and page size determination. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 17–34, 2023.
- Michael Lespinasse. V2: idle page tracking / working set estimation, 2023.
- Pond: Cxl-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, page 574–587, New York, NY, USA, 2023. Association for Computing Machinery.
- Tpp: Transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 742–755, New York, NY, USA, 2023. Association for Computing Machinery.
- Inc. Micron Technology. Micron launches memory expansion module portfolio to accelerate cxl 2.0 adoption. https://investors.micron.com/news-releases/news-release-details/micron-launches-memory-expansion-module-portfolio-accelerate-cxl, 2022.
- Profiling dynamic data access patterns with controlled overhead and quality. In Proceedings of the 20th International Middleware Conference Industrial Track, Middleware ’19, page 1–7, New York, NY, USA, 2019. Association for Computing Machinery.
- Flexhm: A practical system for heterogeneous memory with flexible and efficient performance optimizations. ACM Trans. Archit. Code Optim., 20(1), dec 2022.
- Or-tools.
- Scaling language models: Methods, analysis & insights from training gopher. ArXiv, abs/2112.11446, 2021.
- Hemem: Scalable tiered memory management for big data applications and real nvm. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP ’21, page 392–407, New York, NY, USA, 2021. Association for Computing Machinery.
- Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, page 135–146, New York, NY, USA, 2013. Association for Computing Machinery.
- Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. ArXiv, abs/2201.11990, 2022.
- Xsbench - the development and verification of a performance abstraction for monte carlo reactor analysis. 09 2014.
- Tmo: Transparent memory offloading in datacenters. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’22, page 609–621, New York, NY, USA, 2022. Association for Computing Machinery.
- Nimble page management for tiered memory systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, page 331–345, New York, NY, USA, 2019. Association for Computing Machinery.
- An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 169–182, 2020.
- Yu Zhao. Multigenerational lru framework. https://lwn.net/Articles/880393/, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.