The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

Published 7 Jul 2022 in cs.DB | (2207.03027v1)

Abstract: Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of "shared what" in the database community. We envision that distributed shared-memory databases (DSM-DB, for short) - that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB.

Abstract PDF Upgrade to Chat

Citations (30)

View on Semantic Scholar

Summary

The paper demonstrates that RDMA-enabled memory disaggregation revitalizes DSM-DB architectures by decoupling compute and memory resources.
The study evaluates various concurrency control methods, including no caching, caching without sharding, and sharding to mitigate cache coherence challenges.
The research emphasizes that innovative buffer management and index design are crucial for optimizing remote memory access in high-speed RDMA networks.

The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

The paper "The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation" (2207.03027) explores the utilization of memory disaggregation (MD) in database systems, specifically leveraging RDMA for efficient communication. The paper argues for the shift towards distributed shared-memory databases (DSM-DB) as facilitated by advances in networking technology such as RDMA.

Introduction to Memory Disaggregation

Memory Disaggregation (MD) separates CPU and memory into distinct nodes connected over high-speed networks like RDMA, which provides advantages like enhanced memory utilization and independent scaling of resources.

Figure 1: Monolithic architecture vs. memory disaggregation

Traditional monolithic architectures tightly couple compute and memory, leading to inefficiencies especially in cloud environments (Figure 1). In contrast, MD promotes separation, which leads to better resource allocation and cost efficiencies.

Vision and Architectural Evolution

The paper envisions that MD can rejuvenate interest in DSM-DB architectures that did not gain traction previously due to inadequate networking speeds. The advent of RDMA provides the necessary throughput and low latency, encouraging a revisit to DSM-DB designs.

Figure 2: DSM-DB with memory disaggregation (MD)

The DSM-DB architecture (Figure 2) posits compute nodes with high CPU capability and minimal local memory connecting to memory nodes with opposite characteristics, facilitating a scalable distributed shared-memory system. This architecture can support Online Transaction Processing (OLTP) at scale with independent scaling of compute and memory resources.

Concurrency Control and Challenges

Concurrency control (CC) in DSM-DB represents a fundamental challenge due to the lack of hardware-supported cache coherence across compute nodes. The paper identifies several approaches to manage this:

No Cache, No Sharding: Rely entirely on remote memory access via RDMA without caching.
Cache, No Sharding: Use local caches and implement software-level cache coherence.
Cache, Sharding: Implement logical sharding to reduce conflicts and manage coherence through metadata.
Figure 3: Concurrency control design tradeoffs in DSM-DB

Adopting sharding can avoid cache coherence issues (Figure 3), though distributed commit protocols may become necessary, requiring innovations over RDMA primitives.

Buffer Management and Optimization

Buffer management in DSM-DB focuses on minimizing remote memory access costs. The paper argues that traditional disk-based buffer strategies do not translate well due to narrowed performance gaps between local and remote memory. Light-weight management techniques are needed for caching and data movement.

Index Design in DSM-DB

Index design must be reinvented to exploit RDMA and memory disaggregation. This involves RDMA-conscious design, employing one- or two-sided RDMA, and leveraging memory nodes for near-data processing to minimize data transfer. The paper suggests that traditional indexes may fall short in this new architecture, prompting the need for adaptive designs that balance resource utilization.

Challenges and Research Directions

The paper identifies numerous challenges in realizing DSM-DB with memory disaggregation, notably in durability, availability, and scalability. Proposed solutions include novel DSM APIs, efficient concurrency schemes, and revisiting logging techniques for resilience without over-reliance on persistent storage.

Conclusion

The paper makes a strong case that the next wave in database innovation can be driven by distributed shared-memory databases enabled by RDMA technology. With potential for improved scalability, cost-efficiency, and resource utilization, DSM-DB presents a promising architecture for cloud-native database systems. Further research is needed to fully explore and address the challenges articulated, particularly in the realms of buffer management, concurrency control, and index optimization. With these innovations, MD can critically reshape the landscape of distributed database systems.

Markdown