Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Telescope: Telemetry at Terabyte Scale (2311.10275v2)

Published 17 Nov 2023 in cs.OS, cs.AR, cs.DB, and cs.DC

Abstract: Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. awslabs/damon-tests: Tests package for correctness verifications and performance evaluations of damon (https://damonitor.github.io). https://github.com/awslabs/damon-tests/tree/next. (Accessed on 07/19/2023).
  2. Damon: Data access monitor | hacklog. https://sjp38.github.io/post/damon/. (Accessed on 07/19/2023).
  3. Intel® optane™ dc persistent memory product brief. https://www.intel.in/content/dam/www/public/us/en/documents/product-briefs/optane-dc-persistent-memory-brief.pdf. (Accessed on 08/01/2023).
  4. [patch 1/4] mm/damon/dbgfs: Implement recording feature - seongjae park. https://lore.kernel.org/linux-mm/[email protected]/. (Accessed on 07/19/2023).
  5. Power isa version 3.1, 2020.
  6. core.c - kernel/events/core.c - linux source code (v4.14.15) - bootlin, 2023.
  7. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17, page 631–644, New York, NY, USA, 2017. Association for Computing Machinery.
  8. Enabling cxl memory expansion for in-memory database management systems. In Data Management on New Hardware, DaMoN’22, New York, NY, USA, 2022. Association for Computing Machinery.
  9. Exploiting cxl-based memory for distributed deep learning. In Proceedings of the 51st International Conference on Parallel Processing, ICPP ’22, New York, NY, USA, 2023. Association for Computing Machinery.
  10. Dhruba Borthakur. Petabyte scale databases and storage systems at facebook. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, page 1267–1268, New York, NY, USA, 2013. Association for Computing Machinery.
  11. Rethinking software runtimes for disaggregated memory. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 79–92, 2021.
  12. How does the workload look like in production cloud? analysis and clustering of workloads on alibaba cluster trace. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pages 102–109, 2018.
  13. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154, 2010.
  14. CXL. Compute express link, 2023.
  15. Kleio: A hybrid memory page scheduler with machine intelligence. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’19, page 37–48, New York, NY, USA, 2019. Association for Computing Machinery.
  16. Samsung Electronics. Samsung electronics introduces industry’s first 512gb cxl memory module, 2022.
  17. go-pmem: Native support for programming persistent memory in go. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), pages 859–872. USENIX Association, July 2020.
  18. Memory pooling with cxl. IEEE Micro, 43(2):48–57, 2023.
  19. Graph500. Graph500 benchmark specification, 2017.
  20. Efficient memory disaggregation with infiniswap. In NSDI, pages 649–667, 2017.
  21. Proactively breaking large pages to improve memory overcommitment performance in vmware esxi. SIGPLAN Not., 50(7):39–51, mar 2015.
  22. Heterovisor: Exploiting resource heterogeneity to enhance the elasticity of cloud platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’15, page 79–92, New York, NY, USA, 2015. Association for Computing Machinery.
  23. Christian Hansen. Linux idle page tracking, 2018.
  24. Intel. Intel® 64 and ia-32 architectures software developer manuals, 2023.
  25. Intel. Pebs (processor event-based sampling) manual, 2023.
  26. Myoungsoo Jung. Hello bytes, bye blocks: Pcie storage meets compute express link for memory expansion (cxl-ssd). In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage ’22, page 45–51, New York, NY, USA, 2022. Association for Computing Machinery.
  27. Heteroos: Os design for heterogeneous memory management in datacenter. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, page 521–534, New York, NY, USA, 2017. Association for Computing Machinery.
  28. Exploring the design space of page management for Multi-Tiered memory systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 715–728. USENIX Association, July 2021.
  29. Radiant: Efficient page table management for tiered memory systems. In Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management, ISMM 2021, page 66–79, New York, NY, USA, 2021. Association for Computing Machinery.
  30. Redis Labs. Redis. https://redis.io/, 2020. (Accessed on 10/03/2020).
  31. Redis Labs. memtier_benchmark: Nosql redis and memcache traffic generation and benchmarking tool., 2023.
  32. Software-defined far memory in warehouse-scale computers. ASPLOS ’19, page 317–330, New York, NY, USA, 2019. Association for Computing Machinery.
  33. Michael Lespinasse. Intel virtualization technology for directed i/o, 2020.
  34. Michael Lespinasse. V2: idle page tracking / working set estimation, 2023.
  35. Pond: Cxl-based memory pooling systems for cloud platforms. ASPLOS 2023, page 574–587, New York, NY, USA, 2023. Association for Computing Machinery.
  36. Scaling distributed machine learning with the parameter server. In 11th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 14), pages 583–598, 2014.
  37. Intel optane data center persistent memory. In Proc. HotChips: A Symp. High-Perform. Chips, 2019.
  38. Imbalance in the cloud: An analysis on alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data), pages 2884–2892, 2017.
  39. TPP: Transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 742–755, New York, NY, USA, 2023. Association for Computing Machinery.
  40. Memcached. memcached - a distributed memory object caching system. https://memcached.org/, 2020. (Accessed on 10/03/2020).
  41. numactl. numactl - Linux manual page — man7.org, 2023. [Accessed 15-Apr-2023].
  42. Quantifying memory underutilization in hpc systems and using it to improve performance via architecture support. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, page 821–835, New York, NY, USA, 2019. Association for Computing Machinery.
  43. A configurable tlb hierarchy for the risc-v architecture. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), pages 85–90, 2020.
  44. Profiling dynamic data access patterns with controlled overhead and quality. In Proceedings of the 20th International Middleware Conference Industrial Track, Middleware ’19, page 1–7, New York, NY, USA, 2019. Association for Computing Machinery.
  45. Song Jae Park. Masim: Memory access simulator, 2021.
  46. On the memory underutilization: Exploring disaggregated memory on hpc systems. In 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 183–190, 2020.
  47. Hemem: Scalable tiered memory management for big data applications and real nvm. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP ’21, page 392–407, New York, NY, USA, 2021. Association for Computing Machinery.
  48. Redis. memtier_benchmark: A high-throughput benchmarking tool for redis & memcached, 2023.
  49. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC ’12, New York, NY, USA, 2012. Association for Computing Machinery.
  50. Hm-keeper: Scalable page management for multi-tiered large memory systems, 2023.
  51. Proswap: Period-aware proactive swapping to maximize embedded application performance. In 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), pages 1–4, 2022.
  52. Linus Walleij. Arm32 page tables — linusw, 2023.
  53. Nimble page management for tiered memory systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, page 331–345, New York, NY, USA, 2019. Association for Computing Machinery.
  54. Ncredis: An nvm-optimized redis with memory caching. In International Conference on Database and Expert Systems Applications, 2021.
  55. A study of application performance with non-volatile main memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), pages 1–10. IEEE, 2015.
  56. Genome-scale computational approaches to memory-intensive applications in systems biology. In SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pages 12–12, 2005.
  57. Yu Zhao. Multigenerational lru framework, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alan Nair (1 paper)
  2. Sandeep Kumar (143 papers)
  3. Aravinda Prasad (6 papers)
  4. Andy Rudoff (1 paper)
  5. Sreenivas Subramoney (21 papers)

Summary

We haven't generated a summary for this paper yet.