Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distributed Shared Memory (DSM)

Updated 22 April 2026
  • Distributed Shared Memory (DSM) is a system that maps a single global address space onto physically distributed memory across multiple nodes, simplifying programming for cluster and parallel computing.
  • DSM employs various consistency models such as sequential, causal, and linearizability and utilizes coherence protocols like directory-based and NIC-supported approaches to ensure predictable memory behavior.
  • Advanced DSM designs integrate erasure coding, RDMA, and language-guided techniques (e.g., Rust-based models) to balance scalability, fault tolerance, and overall system efficiency.

Distributed Shared Memory (DSM) provides the abstraction of a single, coherent memory address space over a physically distributed set of computing nodes. Under DSM, each node in a distributed or parallel system participates in what appears to be a globally shared memory, but accesses may, in fact, cause communication over a network. DSM enables the decoupling of program logic from the physical memory layout and was originally motivated by the desire to preserve the programming simplicity of shared-memory models while scaling to distributed systems and clusters.

1. Formal DSM Models and Semantics

DSM is defined by the mapping of a global address space onto the physically distributed memory of a set of processes or nodes. Each process PiP_i may access both private (local) and public (remotely accessible) memory. A global address typically takes the form (process_name, local_address) or, in PGAS systems, (locale, offset). The addressing abstraction admits both software and hardware implementations; in software DSM, mapping and coherence are the responsibility of the runtime or communication library (Butelle et al., 2011, Dewan et al., 2021).

DSM systems are characterized by:

2. Consistency Models and Coherence Protocols

DSM implementations must specify, enforce, and expose a well-defined consistency model:

  • Sequential Consistency (SC): All operations appear in some global total order respecting program order. Achievable via centralized or quorum-based timestamping and update protocols, as in SC-ABD (Ekström et al., 2016).
  • Linearizability: Stronger than SC; requires respecting real-time orders of non-overlapping operations (Kulkarni et al., 2022).
  • Causal Consistency: Ensures visibility of causally related updates; implemented via vector clocks or edge-based share-graph timestamps in full or partial replication regimes (Xiang et al., 2017, 0909.2704).
  • Eventual Consistency: Only requires eventual convergence of replicas; ordering is loose except for per-node FIFO (Kulkarni et al., 2022).

Coherence can be maintained via:

  • Directory-based Protocols: E.g., Tardis replaces traditional directories and broadcast invalidations with logical timestamps (wts, rts) per block, requiring only O(log N) metadata per cacheline and no multicast (Yu et al., 2015).
  • NIC-supported locking: Per-datum exclusive access on one-sided RDMA-enabled NICs, supporting fine-grained lock/unlock around gets and puts (Butelle et al., 2011).
  • Version tagging with language guarantees: DRust exposes the Rust ownership model (SWMR enforced by the type system) to drastically simplify coherence logic. All writes are mediated by unique mutable borrows, and per-pointer version (color) tags invalidate stale copies on modification (Ma et al., 2024).
  • Home-based protocols and chunking: E.g., S-DSM in SAT provides multi-protocol coherence at chunk granularity, using home-based MESI with scope or release consistency (Cudennec, 2020).

Table: DSM Consistency Models and Representative Implementations

Consistency Model Key Mechanism Example Protocol
Linearizability Total order, tmcast Attiya-Welch (Kulkarni et al., 2022)
Sequential consistency Quorums, timestamps SC-ABD (Ekström et al., 2016), Tardis (Yu et al., 2015)
Causal consistency Vector/timestamp Share-graph algorithm (Xiang et al., 2017)
Eventual consistency FIFO, convergence COPS-style (Kulkarni et al., 2022)

3. Algorithmic Frameworks and Storage Efficiency

DSM’s communication and storage cost depend strongly on the chosen algorithms and data models:

  • Erasure-coded atomic memory: Storage-efficient DSM can be achieved by parameterizing erasure codes with concurrency bound ν, reducing per-server storage to N/(N2f)/νN/\lceil (N-2f)/\nu \rceil units for an N-server, f-fault-tolerant system. Atomicity is preserved by encoding versions and supporting multi-writer, multi-reader extensions via a combination of replication and coding, with message complexity of O(1) rounds per operation (Zorgui et al., 2018).
  • Compositional correctness proofs for consistency: The SC-ABD algorithm demonstrates a write operation in 1 round and read in 2 rounds, ensuring sequential consistency via a register-by-register compositional proof structure; intersection of majorities ensures propagation of the latest value (Ekström et al., 2016).
  • Non-blocking algorithms and reclamation: In PGAS-based DSMs, scalable atomic operations leverage RDMA pointer compression and epoch-based memory reclamation for non-blocking data structures at scale (Dewan et al., 2020).
  • Partial replication and metadata lower bounds: Causally consistent DSM under partial replication requires each replica to track only those directed edges in the share graph along which causal information can flow; the bit-complexity is minimized accordingly (Xiang et al., 2017).

4. Architectural Realizations and Programming Models

DSM spans a variety of hardware and software platforms:

  • Network-on-Chip DSM: The Epiphany many-core RISC array provides a flat, memory-mapped global address space over a 2D mesh NoC, with sequential consistency and zero hardware cache; all sharing is explicit and incurs mesh latency. Transparent DSM programming is achieved via C++ TMP-generated accessor code (Richie et al., 2017).
  • Cache-Coherent NUMA and Partitioned Shared Memory: Systems like JArena partition the global heap into per-thread or per-node regions, pinning physical pages to NUMA nodes and eliminating false sharing and remote access (Yang et al., 2019).
  • Heterogeneous, event-driven S-DSM: SAT integrates chunk-based allocation, multiple coherence protocols, and a hybrid event-driven/shared-memory API suited to clusters of CPUs, GPUs, and FPGAs, with energy-optimized idle polling (Cudennec, 2020).
  • Disaggregated memory and edge DSM: Emerging systems combine DSM abstractions with memory-disaggregation and routable byte-addressable NVM pools, even across volatile edge environments (Wang et al., 2022, Vaquero et al., 2021).

The programming model flexibility is evident:

  • PGAS (Partitioned Global Address Space): Exposes global view with explicit or implicit locality (Dewan et al., 2021, Dewan et al., 2020).
  • Language-guided DSM: DRust integrates Rust’s ownership and lifetimes to present a transparent, zero-overhead DSM interface, achieving strong consistency and high performance on modern RDMA (Ma et al., 2024).

5. Performance, Scalability, and Trade-Offs

DSM solutions must balance communication, metadata, and synchronization overhead with scalability and transparency:

  • Metadata scaling: Tardis achieves O(log N) per-block state versus O(N) for traditional directories, supporting thousands of cores (Yu et al., 2015). Vector or edge-based timestamp mechanisms scale with replica count and sharing topology (Xiang et al., 2017).
  • Communication patterns: Directoryless or logical-clock-based protocols (e.g., DRust, Tardis) eliminate broadcast invalidations and allow point-to-point messaging; renewed leases or color tags invalidate stale copies only as needed (Ma et al., 2024, Yu et al., 2015).
  • Performance isolation: Partitioned heap managers in NUMA-aware systems deliver near-linear scaling on manycore servers, up to 4.3× over standard allocators (Yang et al., 2019).
  • Checkpointing and rollback: Stronger consistency simplifies global state capture (linearizability allows O(1) image save), while weaker models (causal/eventual) require vector clocks, deltas, and more complex marker protocols, increasing checkpoint and rollback cost (Kulkarni et al., 2022).
  • Storage efficiency: Erasure-coded DSM matches lower bounds when concurrency is limited, with trade-offs between storage, liveness, and latency (Zorgui et al., 2018).

6. Advanced Topics: Fault Tolerance, Computability, and Special-Purpose DSM

DSM systems address crash and recovery:

  • Fault-tolerant protocols: SC-ABD and erasure-coded DSMs tolerate up to f<n/2f<n/2 crash failures, using quorum intersection and write-back stabilization (Ekström et al., 2016, Zorgui et al., 2018).
  • Recoverable mutual exclusion: Recent algorithms attain O(log n / log log n) remote memory references (RMRs) per passage in both cache-coherent and DSM multiprocessors, exploiting recoverable queue structures and minimal atomic primitives (Jayanti et al., 2019).
  • Theoretical task computability: Universality of DSM with bounded-size registers holds only for minority failures (t<n/2t<n/2), breaking down for majority-failure regimes. Topological protocol-complex analysis pinpoints the necessary bit-width for registering state and the threshold for task solvability (Delporte et al., 2023).

DSM also underpins special domains:

  • Self-assembly models: Abstract and kinetic tile assembly models (aTAM, kTAM) can be simulated by causally consistent or GWO-consistent DSMs, allowing reduction of local determinism or self-healing properties to data-race analyses and concurrent-write-freedom (0909.2704).

7. Ongoing Research, Outlook, and Systematic Challenges

The DSM paradigm continues to evolve in response to hardware trends, application requirements, and novel programming language features:

  • Memory disaggregation, RDMA, and CXL are driving renewed development of high-throughput, low-latency DSM layers for both data center and edge/IoT environments (Wang et al., 2022, Vaquero et al., 2021).
  • Adaptive and language-guided coherence protocols are simplifying strong consistency enforcement by leveraging static invariants (e.g., SWMR via Rust) or hybrid/cross-tier protocols (Ma et al., 2024).
  • Optimization of non-blocking and scalable data structures in DSM settings (e.g., DIHT, global atomic objects, epoch-based reclamation) is critical for matching or exceeding message-passing models in throughput and latency (Dewan et al., 2021, Dewan et al., 2020).
  • Programmability versus efficiency remains a fundamental trade-off: stronger models support simpler checkpointing and recovery logic, while weaker models minimize runtime synchronization cost at the price of more complex failure and recovery management (Kulkarni et al., 2022).
  • Open problems include the precise topological characterization of bounded-register DSM, adaptive protocols for highly dynamic or failure-prone regimes, and integration of edge and disaggregated memory models at scale (Delporte et al., 2023, Vaquero et al., 2021).

DSM thus remains both a foundational abstraction and a field of ongoing innovation, bridging parallel, distributed, and increasingly heterogeneous systems design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributed Shared Memory (DSM).