Papers
Topics
Authors
Recent
Search
2000 character limit reached

In-Memory Distributed Object Store

Updated 6 May 2026
  • In-memory distributed object stores are systems that maintain data objects in main memory to achieve low latency, high throughput, and resilience across clusters.
  • They employ diverse architectures like disaggregated memory, NVM integration, and serverless models to optimize scalability and resource utilization.
  • These systems use advanced techniques such as RDMA access, client-side caching, and erasure coding to ensure fault tolerance and efficient metadata management.

An in-memory distributed object store is a system that manages and exposes collections of arbitrary data objects across a cluster of nodes, maintaining all or most object data in main memory to provide low latency access, high throughput, and resilience to typical datacenter-scale or cloud-scale failures. Such stores provide efficient sharing, replication, versioning, and consistency for a wide range of workloads, from transactional key-value access and high-performance analytics to collaborative and versioned data applications. Modern designs span disaggregated, memory-centric datacenter fabrics, integrate with non-volatile memory (NVM), and often provide cloud elasticity, programmable indexing, and durability through erasure coding or replication.

1. Architectural Taxonomy and Design Paradigms

In-memory distributed object stores have evolved through several major architectural paradigms, each addressing system-level trade-offs in elasticity, resilience, locality, and interface expressiveness:

  • Disaggregated Memory Stores: Outback (Liu et al., 13 Feb 2025) exemplifies designs decoupling compute and memory into separate node pools—compute nodes run transactional and indexing logic entirely in user space, memory nodes provide object address spaces with no CPU intervention, and all communication is via one-sided (RDMA) or two-sided verbs. This separation enhances resource utilization and enables independent scaling.
  • Memory-Disaggregated Hardware Backends: The framework built on ThymesisFlow (Abrahamse et al., 2022) demonstrates full memory disaggregation at the hardware level. Here, DRAM regions across nodes are exported as a single globally addressable region, and the object store (an extended Plasma) can transparently allocate buffers on local or remote nodes via FPGA-accelerated OpenCAPI.
  • Serverless/Microservices-Based Stores: InfiniStore (Zhang et al., 2022) introduces "ServerlessMemory," a logically continuous memory space composed of many FaaS (Function-as-a-Service) instances, coupled with a persistent, inexpensive cloud object store for durability. Elasticity and resilience are achieved by leveraging the autoscaling, stateless, and ephemeral properties of serverless computing.
  • In-Memory with Versioned and Rich Semantics: UStore (Dinh et al., 2017) provides an object store with inherent immutability, sharing, and security, exposing a fully versioned DAG abstraction per object, with rich branch/merge capabilities, authenticated history, and tamper-evident storage, while using memory-resident indexing for high throughput.
  • RDMA-Accelerated Channel-Based Object Models: LOCO (Hodgkins et al., 25 Mar 2025) presents a compositional, object-oriented interface for building distributed objects ("channel objects") with their state partitioned and distributed across network memory, exposing the underlying weak coherence and locality of fabrics like RDMA or CXL directly to the programming model.
  • All-Encoding, Small-Object Optimized Schemes: MemEC (Yiu et al., 2017) demonstrates how encoding small objects into fixed-size, erasure-coded chunks, alongside decentralized and coordinated I/O pipelines, minimizes redundancy and transition overhead during failures, specifically for small-object-dominated workloads.
  • NVM-Centric Distributed Object Stores: DAOS (Manubens et al., 2024) leverages byte-addressable SCM/NVM devices, with all metadata in DRAM/scalable class memory, and implements lock-free, epoch-based atomic updates in user space, bypassing kernel bottlenecks for high-throughput I/O.

2. Core Mechanisms: Memory Management, Metadata, and Object Access

Memory and Data Placement

  • Segmented Allocators and Remote Access: Systems such as the ThymesisFlow-based object store (Abrahamse et al., 2022) allocate objects in contiguous DRAM chunks accessible via an FPGA-mediated address translation mechanism. Clients write objects locally or via proxy handles, with mmapped pointers forwarded transparently to remote memory.
  • Distributed Hash Tables and Indices: Outback (Liu et al., 13 Feb 2025) establishes a two-level, dynamic minimal perfect hashing index, minimizing both CPU load at storage (memory-side) and communication overhead. Compute nodes maintain subtree-locators and seeds for bucket assignment, incurring ~5 bits per key, while memory nodes keep dense, collision-free bucket maps.
  • All-Encoded Chunks: MemEC (Yiu et al., 2017) packs key, value, and metadata into 4 KB chunks, seals chunks when full, and assigns each to a fixed (n, k) MDS erasure code stripe, optimizing for small-object packing density.
  • Persistent and Elastic Layering: InfiniStore (Zhang et al., 2022) splits object store duties—short-lived in-memory pools absorb hot objects and provide fast path read/write, while background processes asynchronously flush and reconstruct data from persistent block/object stores to provide durability and recovery.

Metadata Management

  • In-Memory Manifests and Hash Maps: DAOS (Manubens et al., 2024) maintains all object metadata—including manifests, attribute trees, and lookup indices—in DRAM or NVM, with per-container structures persisted atomically at transaction boundaries.
  • gRPC/rPC-Based Directory Services: Metadata location in systems such as the ThymesisFlow-enhanced Plasma (Abrahamse et al., 2022) is tracked via gRPC lookups, with distributed hash maps coupling object IDs to buffer addresses and lengths.
  • Client-Side Caching and Version Tables: InfiniStore (Zhang et al., 2022) and Outback (Liu et al., 13 Feb 2025) employ lightweight, client-library resident metadata caches to minimize repeated lookups and maintain index consistency under workload migration or elastic scaling.

Object Access Protocols

  • RDMA Primitives: One-sided RDMA (read/write/atomics) enables compute nodes to fetch object data or update state without remote CPU involvement (Liu et al., 13 Feb 2025, Hodgkins et al., 25 Mar 2025). Two-sided RDMA (SEND/RECV) is leveraged for single-round-trip object RPCs.
  • Consistent Hashing and Placement Rules: In DAOS, a consistent hash on the object ID distributes object shards or replicas across storage targets, while UStore partitions object version DAGs using a hash of key and version-specific nonce (Manubens et al., 2024, Dinh et al., 2017).

3. Consistency, Concurrency, and Fault Tolerance

  • Lock-Free, Epoch-Based Transactions: DAOS (Manubens et al., 2024) uses epoch numbers to group I/O calls and commits changes across multiple targets atomically at epoch close. This provides transactional crash-consistency and eliminates the need for central locking or consensus for normal operations.
  • Linearizability and Explicit Fencing: LOCO (Hodgkins et al., 25 Mar 2025) exposes a fence-based programming model, where global, per-thread, or per-pair fences enforce ordering among RDMA verbs, allowing linearizable execution of distributed object methods. All state modifications can be forced into global visibility via explicit fences and per-object locking protocols.
  • Decentralized vs Coordinated I/O Modes: MemEC (Yiu et al., 2017) supports decentralized, low-latency I/O when all servers are healthy, and transitions to a coordinated, redirect-based mode upon failures. State transitions (normal → intermediate → degraded) are orchestrated by a coordinator using atomic broadcasts.
  • Replication, Erasure Coding, and Recovery: InfiniStore and MemEC both leverage erasure codes to provide fault tolerance with lower storage overhead than replication (Zhang et al., 2022, Yiu et al., 2017). Upon loss of memory capacity (e.g., ephemeral FaaS reclaim or node failure), objects are restored from persistent backups or reconstructed in parallel using coding redundancies.
  • Immutability and Versioned Authenticated Storage: UStore (Dinh et al., 2017) ensures all object versions are immutable, and cryptographically binds each version's payload, parent links, and metadata via content-addressed hashes and signatures, providing tamper-evident state and enabling efficient, lock-free multi-version concurrency.

4. Performance and Scalability

Throughput and Latency

  • RDMA-Accelerated Designs: Outback (Liu et al., 13 Feb 2025) achieves 6.0 Mops for get-only YCSB-C workloads, superior to RACE, MICA, and Cluster, with peak latencies ~10 ÎĽs at scale. LOCO’s channel-based key-value store achieves 1.5 Mops (uniform) and 1.2 Mops (Zipfian) in 8-node clusters, outstripping hand-tuned custom RDMA systems by 10–30% (Hodgkins et al., 25 Mar 2025).
  • Large-Scale Object Access: DAOS, on 16-node, 0.24 PiB clusters, sustained 60 GiB/s writes and 90 GiB/s reads under 1 MiB I/O (Manubens et al., 2024). Plasma over ThymesisFlow (Abrahamse et al., 2022) observed 5.75 GiB/s remote (6.5 GiB/s local) bandwidth, with per-object latencies 1.9–5 ms for small objects and ~0.075–2.6 ms for larger objects.
  • Serverless Elasticity: InfiniStore maintained linear throughput and sub-150 ms p90 latencies for 10–100 MB objects during workload scaling, and recovered 3 GB lost memory in ~1.2 s with 20 parallel recovery operations (Zhang et al., 2022).

Overhead, Redundancy, and Transition Costs

  • Erasure Coding Redundancy: MemEC cuts redundancy by up to 58% versus triple replication, with overhead r=n/kr = n/k, e.g., (10,8) MDS code yields r=1.25r = 1.25 (Yiu et al., 2017).
  • Transition Overhead: MemEC’s state transitions from normal to degraded and back occur in ≤5 ms and ~600 ms, respectively, under load (Yiu et al., 2017). In degraded mode, set/update/get latencies increase by 11.6%, 50.9%, and 36.9% respectively.
  • Elasticity Cost Model: InfiniStore’s cost is Ctotal=Cmem+CreqC_{\rm total} = C_{\rm mem} + C_{\rm req}, where CmemC_{\rm mem} minimizes the memory in use and CreqC_{\rm req} scales with the number of FaaS invocations (Zhang et al., 2022).

Comparative Analysis

Store/Framework Peak Throughput Storage Overhead Fault Tolerance Mechanism Reference
Outback 6.0 Mops (get) ~7 bits/key compute-side index + end-to-end hashing (Liu et al., 13 Feb 2025)
DAOS 90 GiB/s (read) k+m, tunable epoch-based txns, EC/replication (Manubens et al., 2024)
MemEC 800 Kops/s (get) 1.25Ă— (10,8 EC) all-encoding, chunked EC, coordinator (Yiu et al., 2017)
InfiniStore 95% SMS hit ratio EC, minimal erasure codes, snapshotted log (Zhang et al., 2022)
Plasma/Thymesis 5.75 GiB/s remote N/A currently none, possible replication EC (Abrahamse et al., 2022)
LOCO 1.5 Mops (kv) explicit mgmt channel fencing, programmable semantics (Hodgkins et al., 25 Mar 2025)
UStore 430 Kops/s versioned immutable DAG, R+W > N consensus (Dinh et al., 2017)

5. Interface Expressiveness, Applications, and Semantics

  • Rich Object Abstractions: UStore (Dinh et al., 2017) exposes a full version DAG per object with per-version access control, authenticated history, and efficient version scan APIs, enabling direct support for Git-like systems, relational collaboration, weakly-consistent transactions, and blockchain ledgers. LOCO (Hodgkins et al., 25 Mar 2025) provides composable, network-resident channel objects (registers, queues, key-value maps).
  • API Multiplicity and Compatibility: DAOS (Manubens et al., 2024) offers native object, POSIX-lite, and FUSE-based interfaces, supporting both high-performance custom applications and legacy POSIX software with varying degrees of fidelity and performance.
  • Elasticity for Cloud and Serverless: InfiniStore (Zhang et al., 2022) and similar architectures natively absorb variable working sets and provide fine-grained scaling and pay-per-access models, suitable for cloud registries, caching, and as a bridge between fast memory and durable, slow storage layers.
  • Small vs. Large Object Optimization: MemEC (Yiu et al., 2017) is designed for workloads with small objects, optimizing packing density, redundancy, and recovery overhead. Outback (Liu et al., 13 Feb 2025) and Plasma/ThymesisFlow (Abrahamse et al., 2022) are object-size agnostic, but throughput of disaggregated memory rises for ≥1 MB objects.

6. Limitations, Trade-offs, and Open Challenges

  • Consistency and Incoherence: Systems leveraging low-level RDMA or uncacheable memory (LOCO (Hodgkins et al., 25 Mar 2025), Outback (Liu et al., 13 Feb 2025)) avoid hardware coherence overhead by exposing explicit fencing and placement to the application, trading ease of use for sharper performance/correctness control.
  • Metadata Overhead for Small Objects: For high object counts with small object sizes, gRPC lookup latency and metadata mapping overhead become significant relative to data transfer (Abrahamse et al., 2022). Caching and batching reduce, but do not eliminate, this penalty.
  • Fault Tolerance and Recovery: Serverless stores depend on fast, parallel rehydration from persistent layers (InfiniStore (Zhang et al., 2022)), but as lost working set size or degree of FaaS churn increases, background traffic and cold start overhead become dominant.
  • Mutable Object Support and Write Consistency: Most systems, e.g. Plasma/Thymesis, expose only immutable sealed objects to avoid write races, complicating certain update-heavy applications (Abrahamse et al., 2022).
  • Scaling Coordination and State Transition: MemEC’s reliance on a single coordinator (potential single point of failure) and the need for atomic broadcast during state transitions may impose overhead at scale (Yiu et al., 2017). Extensions to replicated coordinators are suggested.
  • End-to-End Security: While UStore (Dinh et al., 2017) deeply integrates version hash authentication and per-version signatures, most operator- or network-facing object store APIs do not provide object-level encryption or MACs by default.

7. Future Directions

  • Programmable SmartNIC and In-Network Offload: Outback proposes leveraging SmartNICs (e.g., PRISM-style extended RDMA operations) to offload indexing and data motion, achieving storage-node CPU elimination and potentially improved latency (Liu et al., 13 Feb 2025).
  • Memory Disaggregation Hardware Evolution: Migration to new architectures such as IBM POWER10 Memory Inception and global memory fabrics with improved cache coherence or persistent memory tiers is an ongoing area (Abrahamse et al., 2022).
  • Weak and Strong Consistency Models: Further investigation into combining explicit fence-based (LOCO (Hodgkins et al., 25 Mar 2025)) and optimistic transaction (DAOS (Manubens et al., 2024)) programming models with application-specific consistency requirements.
  • Multi-Object Transactions and Advanced Indexing: Ongoing work in embedding transactional protocols and secondary indices, including learned indexes and hybrid B+-tree/perfect hash data planes, will expand the applicability of RDMA and memory-disaggregated stores to broader classes of workloads (Liu et al., 13 Feb 2025).
  • Cross-Store Metadata Management: Enhanced caching, publish/subscribe, and lightweight distributed consensus for in-use tracking and eviction across large object stores are critical for large-scale, multi-tenant deployments (Abrahamse et al., 2022).
  • Application Integration: End-to-end benchmarking on production analytics frameworks and AI/ML platforms (e.g., Spark, Dask) remains a central need to measure the real-world impact and trade-offs of emerging in-memory distributed object store designs.

For further empirical details, design rationale, and code references, see the cited works: Outback (Liu et al., 13 Feb 2025), DAOS (Manubens et al., 2024), InfiniStore (Zhang et al., 2022), MemEC (Yiu et al., 2017), ThymesisFlow/Plasma (Abrahamse et al., 2022), LOCO (Hodgkins et al., 25 Mar 2025), and UStore (Dinh et al., 2017).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to In-Memory Distributed Object Store.