Multi-User Shared Memory Systems

Updated 14 November 2025

Multi-user memory sharing is a framework that implements protocols and mechanisms for multiple agents to efficiently access and coordinate shared memory resources.
It employs advanced techniques such as retrieval, deduplication, and dynamic allocation to optimize performance, ensuring low latency and enhanced accuracy.
The approach underpins modern cloud, container, and hypervisor systems, balancing high utilization with robust security, isolation, and policy-driven controls.

Multi-user memory sharing refers to mechanisms, protocols, and software frameworks that enable multiple independent agents, processes, or users to efficiently access, coordinate, and utilize shared memory resources. This includes shared knowledge pools for LLM agents, virtual memory slices in hypervisors, deduplicated page sharing for cloud/serverless environments, capability-based address space management, and hardware-supported coherence in rack-scale or multi-host systems. The technical goals are to maximize memory utilization, enhance inference or query accuracy via shared context, reduce inter-process communication latency, provide strong security isolation, and enable dynamic, policy-driven access controls.

Multi-user memory sharing architectures span four principal domains:

LLM-based agent memory pools (Gao et al., 2024): Multiple LLM agents share prompt–answer memories in a dynamically growing pool, amplifying in-context learning by pooling examples and enabling retrieval across agents. Storage, filtering, and retrieval modules are tightly integrated, with real-time filtering guaranteeing high-quality, relevant memory objects.
Dynamic multi-tenant DRAM caches (Cidon et al., 2016): Systems like Memshare dynamically manage DRAM among applications (tenants), maintaining private per-tenant guarantees and allocating the shared remainder to tenants with the steepest local cache hit-rate gradient. Log-structured storage and pluggable eviction policies ensure insulation and optimal sharing.
Inter-VM shared memory slices and notification protocols (Sreenivasamurthy et al., 2019): SIVSHM divides a physical shared-memory object into non-overlapping, statically mapped slices, each given to a virtual machine (VM). Event notification uses direct peer-to-peer eventfd exchange, avoiding OS bottlenecks and boot-time delays while guaranteeing slice isolation.
Hardware-supported cache-coherent memory pools (Wang et al., 16 Feb 2025, Jain et al., 2024): Compute Express Link (CXL) introduces a global fabric-attached memory pool across hosts, delivering fine-grained load/store access, directory-based coherence, and hardware invalidation. Advanced protocols tune coherence granularity and offer hybrid software-hardware solutions for large-scale sharing.

2. Formal Models and Mathematical Formulation

Distinct frameworks formalize sharing as follows:

LLM Memory Sharing (Gao et al., 2024): The pool $M=\{(x_i, y_i)\}_{i=1}^N$ is continually updated as agents submit new (prompt, answer) pairs. The memory filter $F((x^*, y^*); M)$ admits objects only if their scorer-assigned quality exceeds threshold $\tau$ :

$F((x^*,y^*); M) = \begin{cases} (x^*,y^*) & \text{if } s(x^*,y^*) \ge \tau \ \emptyset & \text{otherwise} \end{cases}$

Retrieval for a new query $q$ selects top- $k$ from $M$ by cosine similarity in a bi-encoder embedding space.

Multi-tenant Cache Sharing (Cidon et al., 2016): Allocations are expressed via

$m_i = p_i + s_i,\quad \sum_i m_i = M$

The optimal allocation maximizes aggregate hit-rate:

$\max_{m_i \geq p_i,\; \sum m_i=M} \sum_{i=1}^N h_i(m_i)$

Inter-VM Sharing (Sreenivasamurthy et al., 2019): Memory region $M$ is divided into per-VM slices $S_i = M/N$ . Each VM can only access its statically-mapped slice, enforcing

$\forall i \neq j: S_i \cap S_j = \emptyset$

CXL Memory Sharing (Wang et al., 16 Feb 2025, Jain et al., 2024): Shared memory is global, with access mediated by hardware snoop filters (tracking MESI states per cacheline) and supported by software mapping of HDM regions. Latency models, e.g.

$L_{\text{CXL}} = L_{\text{C2M}} + N_{\text{BI}} \times L_{\text{BI\_RTT}}$

detail access path costs under different coherence mechanisms.

3. Security, Isolation, and Policy Enforcement

Security and isolation mechanisms are diverse:

LLM Agent Pool Isolation (Rezazadeh et al., 23 May 2025, Gao et al., 2024): Collaborative memory employs tiered access control (private/shared) with dynamic bipartite graphs linking users, agents, and resources. Read/write policies are Boolean filters over provenance metadata, with auditability guaranteed by immutable attributes (timestamps, contributing agents, accessed resources) and formal adherence proofs.
Inter-VM Segmentation (Sreenivasamurthy et al., 2019): Slice-based mapping and capability enforcement prevent untrusted VM code from escaping its segment. Eventfd notification is confined and the hypervisor mediates all mappings.
Capability-based Sharing (Sartakov et al., 2022): cVMs confine agents to code–data regions by hardware-enforced capabilities. Fine-granularity zero-copy sharing is realized via CAP_File and CAP_Call APIs, with trusted monitors controlling capability distribution and revocation.
CXL Context Partitioning (Jain et al., 2024, Wang et al., 16 Feb 2025): Fabric managers, IOMMUs, and context-specific address maps enforce per-host or per-VM isolation. Hardware/software protocols zeroize freed memory regions and filter unauthorized requests. Optional link encryption mitigates snooping risk.

4. Retrieval, Deduplication, and Dynamic Allocation

Mechanisms for efficient sharing and utilization include:

Retrieval in LLM Agent Pools (Gao et al., 2024): Bi-encoder embedding and top- $k$ similarity search enrich agent prompts by in-context examples pooled from multiple agents. Retriever fine-tuning employs contradiction scoring and cross-entropy mini-batch updates for continual adaptation.
Page Deduplication in Serverless Systems (Qiu et al., 2023): UPM (User-Guided Page Merging) uses madvise hints to kernel to merge identical pages in the address space on demand. A hash table maps page hashes to candidates; identical pages are merged and write-protected, reducing memory footprint up to 55% and doubling container density. CPU overhead is front-loaded and vanishes for warm invocations.
Dynamic Allocation in Web Caches (Cidon et al., 2016): Per-tenant shadow LRUs guide allocation by observing cache miss/hit behavior; greedy reassignment of memory quanta maximizes overall hit-rate. Log-structured compaction and flexible eviction policies exploit fine-grained relocation to eliminate fragmentation.
Adaptive Reservation for SLO-critical Services (Pi et al., 2021): Hermes system adaptively reserves extra heap and mmap pool for latency-critical processes, invoking proactive reclamation of batch-job pages via posix_fadvise. Allocation latency and tail SLO violation are reduced by up to 54% and 84% respectively.

5. Performance, Empirical Evaluation, and Trade-offs

Reported empirical results demonstrate:

System	Metric	Gain/Reduction
MS (LLM Pool)	BERTScore (3-shot, domain pool)	+0.60 (Limerick)
Memshare	Cache hit-rate	+6.1% overall, –39.7% misses
SIVSHM	Boot time	–30% vs IVSHMEM
UPM (Serverless)	Container density	2.2× (AlexNet), 55% memory
Hermes	SLO violation	–84%
CtXnL (CXL, txn)	Throughput (FPGA vs CXL)	2.08×

Performance trade-offs are domain-dependent. Excessively large memory pools can cause recall noise and degrade accuracy (Gao et al., 2024); deduplication may incur one-time latency on cold start (Qiu et al., 2023). CXL-based systems balance strong coherence against hardware complexity; selective coherence protocols (CtXnL) prioritize high-transaction throughput by delaying MESI invalidations to commit.

Pooling knowledge or memory across agents or tenants boosts utilization, accuracy, and resource efficiency but mandates strict security, auditability, and dynamically adjustable policy controls.

6. Limitations and Future Research Directions

Limitations and directions identified in the literature include:

Optimal Pool Management: Determining pool size that balances diversity with recall accuracy is unresolved (Gao et al., 2024).
Rubric and Policy Automation: Automatic generation of quality rubrics and sharing policies without domain bias remains challenging (Gao et al., 2024, Rezazadeh et al., 23 May 2025).
Cross-Model Heterogeneity: Sharing across heterogeneous LLM architectures may further enrich collaborative intelligence (Gao et al., 2024).
Bloom-filter Snoop Filters: Hardware complexity constraints favor research into approximate state tracking and scalable directory management in CXL fabrics (Jain et al., 2024).
Side Channels in Capability Systems: Partitioning of microarchitectural channels (cache, TLB) in capability-based environments is not fully addressed (Sartakov et al., 2022).
Transparent Remote Paging and Placement: Techniques for page-level migration and adaptive allocation over RDMA/CXL are active areas (Jain et al., 2024).
Auditability and Retrospective Policy Reevaluation: Memory frameworks supporting immutable provenance and log-based auditing lay the groundwork for systematic, explainable access control (Rezazadeh et al., 23 May 2025).

Multi-user memory sharing is a cornerstone for scalable, secure, and efficient collective computation and collaborative intelligence, particularly as datacenter, cloud, AI, and containerized workflows converge on hybrid architectures. The state of practice spans application-level pooling, kernel and hypervisor memory management, and hardware/fabric protocols, each pursuing optimal trade-offs among utilization, latency, isolation, and auditability.