Interference in Shared Memory Pools

Updated 12 December 2025

Interference in shared memory pools is the degradation of performance and predictability caused by multiple agents competing for common physical memory resources.
Research employs queuing analysis, regression models, and auto-tuned black-box estimation to quantify worst-case latencies and slowdowns in varied system architectures.
Mitigation strategies such as bank partitioning, bandwidth throttling, and pool-aware scheduling are essential for ensuring predictable, fair, and secure system operation.

Interference in shared memory pools encompasses all performance degradation, predictability loss, and security vulnerabilities stemming from multiple agents or threads concurrently contending for common physical memory resources. Across COTS multicore architectures, heterogeneous SoCs, disaggregated memory fabrics, and virtualized and embedded systems, the phenomenon manifests at multiple system levels: from the granularity of bank or interconnect conflicts, through controller scheduling and queuing, to inter-application competition for bandwidth or latency. Contemporary research unambiguously shows that properly characterizing, bounding, and mitigating this interference is critical for system throughput, fairness, QoS enforcement, predictable real-time computation, and information security.

1. Fundamental Sources and Mechanisms of Interference

Interference in shared memory pools originates in the physical and logical structure of memory subsystems:

Bank Contention and Scheduling: DRAM is divided into banks, each supporting a limited number of outstanding requests. Simultaneous access to the same bank by different agents causes queuing and explicit conflicts, especially under open-page policies and FR-FCFS schedulers, which prioritize row-buffer hits but can lead to head-of-line blocking (Yun, 2014).
Queuing Structures: Limited-size read/write buffers and request queues at the DRAM controller level determine the number, order, and latency of in-flight requests. Outstanding requests originating from out-of-order execution, speculative loads, and hardware prefetchers—all enabled by a large MSHR pool on COTS platforms—amplify both aggregate throughput and interference (Yun, 2014, Subramanian et al., 2018).
Interconnect Arbitration: In multicore/heterogeneous SoCs, all traffic traverses a shared interconnect (e.g., AXI, NoC), where bus or crossbar arbitration not only adds latency but distributes contention non-uniformly based on arbitration policy and address mapping (Carletti et al., 2023, Riedel et al., 2023, Cavalcante et al., 2020).
Cache and Bank Partitioning: Absence of efficient way or bank partitioning in LLC and DRAM raises the risk of capacity conflicts and cache eviction storms, propagating interference across a much broader address and temporal footprint (Yun, 2014).
Bank Address Mapping and Locality: Address-mapping schemes—such as word/bank interleaving or local block scrambling—control the probability of conflict and the fraction of requests hitting local vs. remote resources under uniform or skewed access patterns (Riedel et al., 2023, Cavalcante et al., 2020).

2. Analytical and Empirical Models of Interference

Quantitative models are essential for interference prediction, bounding, and mitigation design:

Worst-Case Queue-Based Bounds: Under partitioned LLC/banks (so no cross-core space contention), with $N_{rq}$ prior reads and $N_{wq}$ pending writes at the controller, the worst-case extra service latency per request is

$D_p = N_{rq}\cdot t_{BURST} + N_{wq}\cdot t_{RC} + t_{WTR}$

where $t_{BURST}$ is the data-bus burst length, $t_{RC}$ the DRAM row-cycle time, and $t_{WTR}$ the bus turn-around penalty. Task-level total interference is then $H_i\cdot D_p$ , where $H_i$ is the number of DRAM requests by task $\tau_i$ (Yun, 2014).

Slowdown Estimation via Request Service Rate (MISE): Relative performance loss is cast as

$\text{Slowdown}_i = \frac{\text{ARSR}_i}{\text{SRSR}_i}$

where ARSR is the request-service-rate for $i$ in isolation and SRSR in the shared context. ARSR is sampled by short, periodic high-priority assignment to $i$ at the memory controller. The model generalizes to non-memory-bound applications with a weighted blend parameterized by the memory-stall fraction $\alpha_i$ (Subramanian et al., 2018, Subramanian, 2015).

Regression Models for Virtualized Pools: In cloud virtual environments,

$I = 0.7498\,T_1 + 0.1598\,T_2 + 0.1456\,T_3$

with $T_1 = T_{SLLC}\cdot G_{SLLC}$ , $T_2 = T_{net}\cdot G_{net}$ , $T_3 = T_{DRAM}\cdot T_{SLLC}\cdot G_{SLLC}$ ; $T_s$ is normalized total access to resource $s$ , $G_s$ is a global similarity factor, and all variables are empirically derived (Alves et al., 2016).

Auto-tuned Black-Box Estimation: Black-box autotuning approaches empirically maximize slowdowns of representative "victim" tasks by generating parameterized interfering "enemy" processes, yielding conservative lower bounds on interference multipliers for WCET estimation (Iorga et al., 2018).
Bandwidth-Sharing and Queuing Models in Disaggregated and Hybrid Systems: Models capturing host-pool bandwidth division, queuing, and burstiness in CXL or rack-scale shared DDR modules predict per-app slowdowns and latency increases, with slowdowns often closely following the ratio $B_{local}/B_{host}(N)$ (Wahlgren et al., 2022, Wahlgren et al., 2023).

3. Experimental Characterization and Key Results

Experimental validation across platforms provides concrete evidence for the theoretical and empirical predictions:

COTS Multicore (Intel Xeon W3530, 8 MiB LLC, 16 DRAM banks): Under heavy concurrent write-mostly interference, pointer-chasing benchmarks measured a $\approx3\times$ slowdown in DRAM-limited execution time, while state-of-the-art single-outstanding-request analysis underestimated delays by up to 47%; parallelism-aware analysis delivered safe bounds within 29% of measurements (Yun, 2014).
Heterogeneous SoCs (NVIDIA TX2, Xilinx ZU9EG): Worst-case slowdowns were highly pattern- and hardware-dependent. On ZU9EG, write-intensive patterns caused up to $12\times$ slowdown (vs. only $1.3\times$ for read-miss), and realistic benchmarks (e.g., 2D stencils) exceeded even these under certain conditions, with single tasks slowed by $>60\times$ when CPUs and FPGA fabric were fully loaded (Carletti et al., 2023).
Hybrid DRAM+DCPM System: RDMA write bursts can degrade local multithreaded latency benchmarking throughput (MLC) by $>80\%$ and double local-access latency as queue-depth at the memory controller grows; even modest RDMA concurrency (2-3 queue pairs) has a pronounced effect, necessitating closed-loop rate control to avoid service-level disruption (Oe, 2020).
Disaggregated and CXL-Pooled Systems: Multiple co-located hosts on a pooled CXL memory fabric experience slowdowns scaling from $<15\%$ (compute-bound) to $2\times$ (memory-bound) or worse with bursty synchronous traffic. Profiled graph kernels (BFS, PageRank) exhibited chokepoints when remote-access patterns overshot the fabric's bandwidth share (Wahlgren et al., 2022, Wahlgren et al., 2023).
Manycore Shared-L1 Clusters (MemPool): Hierarchical interconnects and high over-banking (4 banks/PE) yield average access latency under 6 cycles and less than 2% execution stalls even at 256 PE scale, provided the per-PE offered load remains below 0.35 req/PE/cycle (Riedel et al., 2023).

4. Application Domains: Real-time, HPC, Security, and Virtualization

Interference in shared memory pools constrains or motivates techniques across several application classes:

Real-time and Mixed-criticality Systems: Predictable upper bounds on interference are necessary for WCET analysis; mitigation is realized via static resource partitioning (bank/coloring), hardware support for MSHR partitioning, priority-aware DRAM scheduling, and closed-loop feedback controllers (e.g., MemGuard, MISE-QoS) (Yun, 2014, Costa et al., 27 Jan 2025, Subramanian, 2015).
High-Performance Computing and Disaggregated Architectures: Contemporary composable memory systems (CXL, rack-scale) experience significant cross-host interference. Roofline and arithmetic-intensity-based schedulers, per-job pooling-aware allocation, and dynamic QoS provisioning address both utilization and fairness (Wahlgren et al., 2022, Wahlgren et al., 2023). Application data placement/scheduling and hardware prefetcher tuning further modulate sensitivity.
Security and Covert-Channel Risks: Direct DRAM contention enables covert channels (MC³) in SM-SoCs lacking shared LLCs, with empirical data rates up to $6.4$ kbps (Orin AGX) and bit errors under $1\%$ , demonstrating that CPU/GPU cross-domain data exfiltration can occur with no privileged access (Dagli et al., 2024).
Cloud and Virtualization: Accurate regression models incorporating both total resource pressure and co-tenant similarity enable VM placement, admission control, and on-line migration to minimize interference-induced slowdowns and SLA violations (Alves et al., 2016).

5. Methodologies for Bounding and Mitigating Interference

Mitigation tactics center on spatial and temporal isolation, supported by both hardware and software measures:

Cache and Bank Partitioning: Assigning exclusive LLC ways and DRAM banks per agent removes cross-eviction and row conflict channels, allowing interference to be modeled solely as queuing at the controller/bus level (Yun, 2014). SP-IMPact offers systematic enumeration and measurement for such configurations (Costa et al., 27 Jan 2025).
Bandwidth and Access Throttling: Controllers implement MemGuard/MISE-QoS policies and feedback control loops to cap or proportion request rates per agent or VM, bounding queuing depth and worst-case delays (Subramanian et al., 2018, Costa et al., 27 Jan 2025, Oe, 2020).
Admission Control and Pool-Aware Scheduling: Static/dynamic schedulers profile each workload's bandwidth and burstiness, strictly partitioning pool shares and/or job placement to ensure no overcommitment of CXL/fabric or DRAM controller capacity (Wahlgren et al., 2022, Wahlgren et al., 2023).
Address Mapping and Locality Management: Hybrid word/group interleaving and address scrambling in large shared scratchpad clusters (MemPool) dramatically reduce long-path and conflict probability, keeping most latency near the minimal pipeline depth (Riedel et al., 2023, Cavalcante et al., 2020).
Empirical Tuning and Black-box Analysis: For modern, heterogeneous platforms where analytical modeling is insufficient, auto-tuning frameworks synthesize interference-maximizing “enemy” workloads and measure slowdowns, providing safe lower bounds for WCET and tool-driven configuration recommendation (Iorga et al., 2018).

6. Security, Measurement, and Future Directions

Covert Channels and Timing Side-channels: Security implications are acute in multi-domain SoCs and pooled-memory clouds. MC³ demonstrates that even without shared LLC, DRAM contention can leak information at measurable rates, necessitating mitigations such as bank partitioning, randomized MC scheduling, and OS-level noise injection (Dagli et al., 2024).
Measurement Infrastructure: Frameworks such as SP-IMPact (embedded systems) and multi-level profiling stacks (rack-scale pooling) facilitate practical tuning, configuration search, and validation of predicted vs empirical worst-case interference for varied workloads (Costa et al., 27 Jan 2025, Wahlgren et al., 2023).
Modeling and Analysis Challenges: Current static analysis often fails to capture interacting effects of all shared resources (IOMMU, interrupt controllers, PCIe buses). Research advances towards analytical models accounting for more microarchitectural nuance (MSHR bottlenecks, write buffer, interconnect effects) remain crucial (Costa et al., 27 Jan 2025, Carletti et al., 2023).
Implications for Architecture and System Design: Future high-core-count systems, disaggregated fabrics, and security-critical SoCs will benefit from integrating partitioning primitives, flexible software/hardware bandwidth caps, comonadic job scheduling, and on-line interference measurement with formal verification frameworks to balance utilization, predictability, fairness, and security.

7. Summary Table: Key Interference Parameters and Mitigation Actions

Parameter	Hardware/Software Domain	Mitigation/Bound Mechanism
MSHR/buffer depth	COTS mCores, DRAM controller	MSHR partitioning, buffer sizing
Bank/way assignment	COTS, SoC, SPH systems	Bank/LLC partitioning, cache coloring
Outstanding requests	CPUs, manycores, GPUs	Limit/partition MSHRs, throttle issue width
Request scheduling	DRAM, CXL, pooling fabric	MISE, MemGuard, controller-level lottery scheduling
Access mapping scheme	Manycore SPM, DRAM interleaving	Locality-optimized mapping, hybrid/block scrambling
Co-located workload mix	Cloud, disaggregated HPC, virtualization	Scheduling/admission control via access/similarity

By combining system-aware partitioning, scheduling, empirical tuning, and precise per-resource measurement or estimation, designers can bound and mitigate the often severe performance and security costs of interference in shared memory pools across contemporary and emerging computing systems (Yun, 2014, Subramanian et al., 2018, Costa et al., 27 Jan 2025, Wahlgren et al., 2023, Dagli et al., 2024, Carletti et al., 2023, Alves et al., 2016, Cavalcante et al., 2020, Riedel et al., 2023, Oe, 2020).

Markdown Upgrade to Chat

References (13)

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems (2014)

Predictable Performance and Fairness Through Accurate Slowdown Estimation in Shared Main Memory Systems (2018)

The Importance of Worst-Case Memory Contention Analysis for Heterogeneous SoCs (2023)

MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory (2023)

MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect (2020)

Providing High and Controllable Performance in Multicore Systems Through Shared Resource Management (2015)

A Quantitative Model for Predicting Cross-application Interference in Virtual Environments (2016)

Do Your Cores Play Nicely? A Portable Framework for Multi-core Interference Tuning and Analysis (2018)

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems (2022)

10.

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems (2023)

11.

Analysis of Interference between RDMA and Local Access on Hybrid Memory System (2020)

12.

SP-IMPact: A Framework for Static Partitioning Interference Mitigation and Performance Analysis (2025)

13.

MC3: Memory Contention based Covert Channel Communication on Shared DRAM System-on-Chips (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interference in Shared Memory Pools.