Cache Sharing Protocols in Modern Systems
- Cache sharing protocols are mechanisms that enable multiple systems to efficiently share cached data using metadata, replacement policies, and consistency controls.
- They implement models such as way-partitioned LLCs, layered distributed caches, and multi-tenant ML inference to minimize redundancy and enhance data reuse.
- Empirical evaluations indicate significant performance gains, improved cache hit rates, and scalable throughput in multi-core and distributed environments.
A cache sharing protocol comprises the set of mechanisms, metadata, and policies by which multiple clients, agents, or systems can effectively share access to a cache—typically in multi-core computers, distributed storage, multi-tenant machine learning serving, or content delivery networks. By coherently managing shared or partitioned cache capacity, controlling replacement, enforcing consistency and/or security policies, and exposing appropriate abstractions to clients, these protocols aim to maximize performance, fairness, compatibility, or security subject to system-specific constraints.
1. Architectural Models for Cache Sharing
Cache sharing protocols vary fundamentally according to architecture and target workload. Representative models include:
- Way-Partitioned and Reuse-Aware Shared LLCs: In multicore systems, shared last-level caches (LLCs) may be split into statically assigned “ways” or dynamically partitioned among cores or threads. Protocols such as SRCP overlay partitioning with per-line sharing- and reuse-metadata, distinguishing local from remote usage to avoid duplication and premature eviction of shared data (Ghosh et al., 2022).
- Layered Caching Topologies in Distributed Systems: Systems like DistCache comprise multi-layer cache networks where objects are mapped via independent hashes to distinct nodes in each layer, supporting provable load balancing, minimal coherence overhead, and scalable throughput (Liu et al., 2019).
- Multi-Agent and Multi-Tenant ML Inference: Shared key-value (KV) cache systems enable cross-user, cross-agent, or cross-request reuse in LLM inference workloads, sometimes with complex semantic alignment or quantifiable security guarantees (Yang et al., 17 Mar 2025, Yang et al., 2024, Pennas et al., 11 Mar 2026, Jeon et al., 1 Feb 2026).
- Shared Caching in Content Distribution: Schemes use logical or physical partitions per provider, workload, or subset, sometimes coordinated using utility maximization, information-theoretic coding, or rigorous admission-control policies (Dehghan et al., 2017, Peter et al., 2022, Kesidis et al., 2019).
A table summarizing select protocol classes:
| Protocol/Domain | Key Architectural Feature | Representative Work |
|---|---|---|
| Way-Partitioned LLC | Per-core partitioning + metadata | SRCP (Ghosh et al., 2022) |
| Multilayer Distributed | Independent-hash bipartite graph | DistCache (Liu et al., 2019) |
| LLM Inference KV Reuse | Cross-request, semantic alignment | KVShare (Yang et al., 17 Mar 2025), KVSharer (Yang et al., 2024), CacheSolidarity (Pennas et al., 11 Mar 2026) |
| Multi-Provider CDN | LRU-partition, object sharing | Utility partitioning (Dehghan et al., 2017), Object sharing (Kesidis et al., 2019) |
| Coded Caching | Placement Delivery Arrays (PDA) | (Peter et al., 2022, Peter et al., 2021) |
2. Protocol Mechanisms: Metadata, Replacement, and Consistency
- Per-line Metadata: Advanced protocols encode sharing status, frequency, and recency. SRCP tracks AFC, Global Count, and local access bits to prioritize shared and frequently reused lines in replacement decisions (Ghosh et al., 2022). In distributed LRU sharing, effective “length” of an object is fractionally charged across owners (Kesidis et al., 2019). LLCs optimized for remote-sharing maintain “shared‐by‐remote” bits and per-set counters to bias retention towards cache lines seen in remote cache-to-cache transfers (Durbhakula, 2019).
- Victim Selection and Replacement: Protocols replace cache entries using custom lexicographic orderings on sharing/usage counters, explicit utility functions, coded sub-packetization maps, or secret-sharing logic. In multi-tenant ML serving, replacement or recomputation may be driven by semantic edit distance or layerwise dissimilarity (Yang et al., 17 Mar 2025, Yang et al., 2024).
- Consistency and Atomicity: Shared caches in distributed or disaggregated memory contexts require rigorously enforced coherence semantics, often mapping lock or latch state to classic MOESI or MSI protocols but realized atop RDMA atomics with embedded ownership metadata, as in SELCC (Wang et al., 2024). In many self-invalidate designs (e.g., Neat), the system forgoes per-line/byte transient states in favor of phase-based flush and invalidate operations, sometimes with further optimization like write signatures or partial-invalidation states (Zhang et al., 2021).
- Security and Access Control: Coded caching schemes for shared environments apply Placement Delivery Arrays, secret-sharing, and/or one-time-pad keys to guarantee information-theoretic secrecy against unauthorized access (Peter et al., 2021). Time-dependent access control overlays hierarchical key distribution atop encrypted content caches, balancing key-management state and scalability (Emura et al., 2023). Defenses against timing side channels, as in CacheSolidarity, track cache entry ownership and isolation flags, enforcing prefix-granular isolation only upon suspicious cross-user reuse (Pennas et al., 11 Mar 2026).
3. Performance and Scalability Considerations
Quantitative benefits and costs are tightly linked to the selection of sharing protocol:
- Cache Hit and Throughput: Adding sharing/reuse-aware policies to partitioned hardware (e.g., SRCP) increases LLC hit-rate (13.3% over LRU) and IPC (10.4%), outperforming alternative replacement schemes (Ghosh et al., 2022). In multi-proxy object-sharing caches, real system prototypes show that the working-set approximation predicts hit rates within 5% of simulation, and ripple-eviction cost (from cascaded evictions across logical lists) can be largely amortized with batched/thresholded replacement (Kesidis et al., 2019).
- Distributed Scaling: In DistCache, object partitioning across two layers of hash functions and adaptive query routing (“power-of-two-choices”) achieve linear scaling of cache throughput across 32 racks and 1.5–2× performance improvements on write-intensive workloads vs. replication (Liu et al., 2019).
- Inference Efficiency in LLMs: Protocols such as KVShare leverage semantic alignment, DELTA-trees, and partial recomputation to preserve accuracy while reducing prefill compute cycles by up to 60%, yielding rough 1.2× system throughput gains and dramatic reductions in time-to-first-token for matching requests (Yang et al., 17 Mar 2025). KVSharer demonstrates that selective, layerwise cache sharing delivers a 30% memory reduction and at least 1.3× generation acceleration at cost of negligible task degradation (Yang et al., 2024). LRAgent exploits shared and low-rank decomposed caches to approach fully shared throughput and memory consumption in multi-LoRA agent systems, outperforming baseline and partial-share implementations (Jeon et al., 1 Feb 2026).
- Overhead and Complexity: Additional metadata (counters, flags, per-entry or per-line bits) incurs minimal storage and logic overhead in most hardware designs, e.g., SRCP’s sharing/usage bits or CacheSolidarity’s 32-byte per-KV metadata (Ghosh et al., 2022, Pennas et al., 11 Mar 2026). Some schemes such as cache-coded caching with PDAs are designed specifically to avoid exponential subpacketization costs (Peter et al., 2022).
4. Theoretical Foundations and Formal Analysis
- Analytic Models: Multi-proxy LRU sharing uses Poisson/IRM-based working-set approximations with independence assumptions and coupled nonlinear equations to accurately estimate per-proxy hit rates and size allocations (Kesidis et al., 2019). Utility-based partitioning frameworks are formulated as convex optimization problems—the unique maximizer can be found by online gradient ascent, guided only by empirical hit-rate measurements (Dehghan et al., 2017).
- Coded and Secretive Shared Caching: Formal properties of Placement Delivery Arrays (PDA) underlie the ability to support low-subpacketization, coded-multicast communication with precise storage and transmission guarantees (Peter et al., 2022). Information-theoretic secrecy is enforced using non-perfect secret-sharing, one-time pad keys, and precise rate/subpacketization bounds; the cut-set lower bounds and order-optimality proofs frame achievable performance (Peter et al., 2021).
- Security Guarantees: CacheSolidarity provides a formal (informal theorem) guarantee that no non-owner can exploit timing side channels by observing cache hits on secret-dependent intervals beyond a flagged prefix, rendering incremental probing attacks ineffective on reused prefixes (Pennas et al., 11 Mar 2026).
5. Practical Algorithms and Implementation
- Adaptive Enablement: Certain replacement and retention policies are made dynamic, with hardware or software logic toggling behavior in response to observed workload conditions, such as the proportion of remote cache transfers (e.g., high-water mark gating in remote-sharing LLCs (Durbhakula, 2019)) or run-time measurement of kernel density TTFT overlap to control prefix isolation (Pennas et al., 11 Mar 2026).
- Online Control and Admission: Utility-driven LRU partitioning implements a discrete-time gradient algorithm responsive to instantaneous hit-count measurements and utility derivatives, adjusting allocations in real time to converge to optimal partitions (Dehghan et al., 2017). Admission-control in multi-proxy shared caches makes explicit use of the working-set equation root for new virtual proxies, guaranteeing system-wide feasibility and fairness (Kesidis et al., 2019).
- Cache Sharing in Modern ML Serving: Semantic editing via DELTA-Trees and partial recomputation, as in KVShare, is implemented using global embedding indices, cache alignment editors, and partial attention that flags exact recomputation points during model prefill (Yang et al., 17 Mar 2025). KVSharer and LRAgent protocols are compatible with intra-layer compression and readily combine with existing attention/reuse kernels (Yang et al., 2024, Jeon et al., 1 Feb 2026).
6. Trade-Offs, Limitations, and Future Directions
- Interference vs. Reuse: Dedicated way-partitioned caches eliminate interference but block cross-core data reuse; sharing-aware, reuse-sensitive protocols mitigate this at modest complexity cost (Ghosh et al., 2022). In ML systems, cache sharing is often balanced with strict multi-tenant isolation or the risk of side-channel leakage, motivating granular tracking solutions (Yang et al., 17 Mar 2025, Pennas et al., 11 Mar 2026).
- Metadata, State, and Scalability: Increasing sharing state—per-line, per-prefix or per-user—can challenge capacity or verification for large-scale systems. Macroscopically, batch/threshold approaches (as in ripple control (Kesidis et al., 2019)) can reduce overhead. In coded caching, construction techniques (e.g., PDAs from combinatorial design) can achieve sub-exponential subpacketization (Peter et al., 2022, Peter et al., 2021).
- Assumptions and Generalization: The efficacy of self-invalidate coherence approaches depends on the data-race-free (DRF) model. Rigid DRF is necessary for Neat's protocol simplicity and correctness, limiting its applicability in non-DRF or weakly ordered workloads (Zhang et al., 2021). Security-driven shared cache protocols are only resistant to side-channel probing after the first sharing event and may require further engineering for more aggressive attackers (Pennas et al., 11 Mar 2026).
- Prospective Research: Research directions include dynamic partition resizing based on runtime observations (Ghosh et al., 2022), integration of sharing-aware protocols into nonuniform or composable memory architectures (Wang et al., 2024), and co-design of hardware–software annotations for advanced sharing pattern control.
7. Comparative Summary and Representative Results
A concise table highlighting representative protocols and primary quantitative improvements:
| Protocol | Domain | Quantitative Benefit | Reference |
|---|---|---|---|
| SRCP | Multicore LLC | +13.34% LLC hit-rate, +10.4% IPC over LRU | (Ghosh et al., 2022) |
| DistCache | Distributed | Linear scaling, 1.5–2× throughput over Replication | (Liu et al., 2019) |
| KVShare | LLM Inference | ≤60% token reuse, ~1.2× throughput, BLEU Δ<1% | (Yang et al., 17 Mar 2025) |
| KVSharer | LLM Inference | 30% memory saved, ≥1.3× decoding speed | (Yang et al., 2024) |
| Utility Partitioning | CDN | Up to 10% utility increase (shared→optimal partition) | (Dehghan et al., 2017) |
| Object-Sharing LRU | CDN | Working-set approx. within 5% of simulation; 15% o/h | (Kesidis et al., 2019) |
| SELCC | Disaggregated | 2.2–5.6× vs. RPC-based, 80%+ hit ratio under Zipf(0.99) | (Wang et al., 2024) |
| CacheSolidarity | LLM Security | Cache reuse ↑70%, TTFT ↓30% vs. static isolation | (Pennas et al., 11 Mar 2026) |
The cache sharing protocol landscape spans a diversity of architectures, objectives, and implementation techniques. By integrating fine-grained metadata, theoretical rigor, and workload-tailored optimization, these protocols collectively represent the state of the art in efficient, secure, and scalable shared caching for contemporary computing infrastructure.