Cache Coherence Over Disaggregated Memory (2409.02088v4)
Abstract: Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the limited computing power on disaggregated memory servers makes traditional cache coherence protocols suboptimal, particularly in the case of stranded memory. This paper introduces SELCC; a Shared-Exclusive Latch Cache Coherence protocol that maintains cache coherence without imposing any computational burden on the remote memory side. It aligns the state machine of the shared-exclusive latch protocol with the MSI protocol , thereby ensuring both atomicity of data access and cache coherence with sequential consistency. SELCC embeds cache-ownership metadata directly into the RDMA latch word, enabling efficient cache ownership management via RDMA atomic operations. SELCC can serve as an abstraction layer over disaggregated memory with APIs that resemble main-memory accesses. A concurrent B-tree and three transaction concurrency control algorithms are realized using SELCC's abstraction layer. Experimental results show that SELCC significantly outperforms Remote-Procedure-Call-based protocols for cache coherence under limited remote computing power. Applications on SELCC achieve comparable or superior performance over disaggregated memory compared to competitors.
- [n. d.]. Advancing Cloud with Memory Disaggregation, https://www.ibm.com/blogs/research/2018/01/advancing-cloud-memory-disaggregation/.
- [n. d.]. Amazon Aurora Multi-Master: Scaling out Database Write Performance, https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Amazon_Aurora_Multi-Master_Scaling_out_database_write_performance_DAT404-R1.pdf.
- [n. d.]. CXL 3.1 Specification, https://computeexpresslink.org/cxl-specification/.
- [n. d.]. Group Replication Plugin Architecture, https://dev.mysql.com/doc/refman/8.3/en/group-replication-plugin-architecture.html.
- [n. d.]. Intel RSD, https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html. https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html
- [n. d.]. PolarDB Multi-master Cluster(Database/Table), https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/multi-master-cluster-edition-database-or-table.
- Treadmarks: Shared Memory Computing on Networks of Workstations. Computer 29, 2 (1996), 18–28.
- The End of Slow Networks: It’s Time for a Redesign. Proceedings of the VLDB Endowment (PVLDB) 9, 7 (2016), 528–539.
- Efficient Distributed Memory Management with RDMA and Caching. Proceedings of the VLDB Endowment (PVLDB) 11, 11 (2018), 1604–1617.
- PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 2477–2489.
- Implementation and Performance of Munin. In Proceedings of the Thirteenth ACM Symposium on Operating System Principles, SOSP, Henry M. Levy (Ed.). ACM, 152–164.
- Benchmarking cloud serving systems with YCSB. In Proceedings of ACM Symposium on Cloud Computing, (SoCC), Joseph M. Hellerstein, Surajit Chaudhuri, and Mendel Rosenblum (Eds.). ACM, 143–154.
- Parallel Computer Architecture: a Hardware/software Approach. Gulf Professional Publishing.
- Taurus MM: bringing multi-master to the cloud. Proceedings of the VLDB Endowment (PVLDB) 16, 12 (2023), 3488–3500.
- FaRM: Fast Remote Memory. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 401–414.
- The Design and Operation of CloudLab. In USENIX Annual Technical Conference (ATC). 1–14.
- Network Requirements for Resource Disaggregation. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 249–264.
- Multi-resource Packing for Cluster Schedulers. In ACM SIGCOMM 2014 Conference. ACM, 455–466.
- Efficient Memory Disaggregation with Infiniswap. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 649–667.
- Yunyan Guo and Guoliang Li. [n. d.]. A CXL-Powered Database System: Opportunities and Challenges. ([n. d.]).
- Design Guidelines for High Performance RDMA Systems. In USENIX Annual Technical Conference (ATC). 437–450.
- Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory. In Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing, (HPDC), Thilo Kielmann, Dean Hildebrand, and Michela Taufer (Eds.). 3–14.
- Farview: Disaggregated Memory with Operator Off-loading for Database Engines. In Conference on Innovative Data Systems Research (CIDR).
- Leslie Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Computers 28, 9 (1979), 690–691.
- Leslie Lamport. 1998. The Part-Time Parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133–169.
- Accelerating relational databases by leveraging remote memory and RDMA. In Proceedings of the International Conference on Management of Data. 355–370.
- Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 574–587.
- Kai Li and Paul Hudak. 1989. Memory Coherence in Shared Virtual Memory Systems. ACM Trans. Comput. Syst. 7, 4 (1989), 321–359.
- Richard J Lipton and Jonathan S Sandberg. 1988. PRAM: A scalable shared memory. Princeton University, Department of Computer Science.
- DEX: Scalable Range Indexing on Disaggregated Memory [Extended Version]. arXiv preprint arXiv:2405.14502 (2024).
- SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory. In USENIX Symposium on Operating Systems Design and Implementation OSDI, Roxana Geambasu and Ed Nightingale (Eds.). 553–571.
- Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In USENIX Annual Technical Conference (ATC). 103–114.
- Balancing CPU and Network in the Cell Distributed B-Tree Store. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22-24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 451–464.
- Latency-Tolerant Software Distributed Shared Memory. In USENIX Annual Technical Conference (ATC), Shan Lu and Erik Riedel (Eds.). 291–305.
- Diego Ongaro and John K. Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In USENIX Annual Technical Conference, USENIX ATC. USENIX Association, 305–319.
- Francois Raab. 1993. TPC-C - The Standard Benchmark for Online Transaction Processing (OLTP). In The Benchmark Handbook for Database and Transaction Systems (2nd Edition), Jim Gray (Ed.). Morgan Kaufmann.
- Pramod Subba Rao and George Porter. 2016. Is Memory Disaggregation Feasible?: A Case Study with Spark SQL. In Proceedings of the Symposium on Architectures for Networking and Communications Systems, ANCS. ACM, 75–80.
- Persistent Memory Disaggregation for Cloud-Native Relational Databases. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS, Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift (Eds.), Vol. 3. ACM, 498–512.
- Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network. In Proceedings of the Sixteenth ACM Symposium on Operating System Principles, SOSP, Michel Banâtre, Henry M. Levy, and William M. Waite (Eds.). ACM, 170–183.
- Andrew S. Tanenbaum. 2009. Modern operating systems, 3rd Edition. Pearson Prentice-Hall.
- Large-scale Cluster Management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys), Laurent Réveillère, Tim Harris, and Maurice Herlihy (Eds.). ACM, 18:1–18:17.
- Chao Wang and Xuehai Qian. 2021. RDMA-enabled Concurrency Control Protocols for Transactions in the Cloud Era. IEEE Transactions on Cloud Computing (2021).
- Jianguo Wang and Qizhen Zhang. 2023. Disaggregated Database Systems. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 37–44.
- Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. In ACM International Conference on Management of Data (SIGMOD). 1033–1048.
- Concordia: Distributed Shared Memory with In-Network Cache Coherence. In USENIX Conference on File and Storage Technologies (FAST), Marcos K. Aguilera and Gala Yadgar (Eds.). 277–292.
- The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation. Proceedings of the VLDB Endowment (PVLDB) 16, 1 (2023), 15–22.
- dLSM: An LSM-Based Index for Memory Disaggregation. In International Conference on Data Engineering (ICDE). 2835–2849.
- Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 233–251.
- The End of a Myth: Distributed Transactions can Scale. Proceedings of the VLDB Endowment (PVLDB) 10, 6 (2016), 685 – 696.
- Redy: Remote Dynamic Memory Cache. Proceedings of the VLDB Endowment (PVLDB) 15, 4 (2022), 766 – 779.
- Rethinking Data Management Systems for Disaggregated Data Centers. In Conference on Innovative Data Systems Research (CIDR).
- Understanding the Effect of Data Center Resource Disaggregation on Production DBMSs. Proceedings of the VLDB Endowment (PVLDB) 13, 9 (2020), 1568–1581.
- Towards Cost-Effective and Elastic Cloud Database Deployment via Memory Disaggregation. Proceedings of the VLDB Endowment (PVLDB) 14, 10 (2021), 1900–1912.
- Design Guidelines for Correct, Efficient, and Scalable Synchronization Using One-sided RDMA. Proceedings of the ACM International Conference on Management of Data (SIGMOD) (2023).
- Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 741–758.
- One-sided RDMA-Conscious Extendible Hashing for Disaggregated Memory. In USENIX Annual Technical Conference (ATC). 15–29.