Papers
Topics
Authors
Recent
2000 character limit reached

Interference in Shared Memory Pools

Updated 12 December 2025
  • Interference in shared memory pools is the degradation of performance and predictability caused by multiple agents competing for common physical memory resources.
  • Research employs queuing analysis, regression models, and auto-tuned black-box estimation to quantify worst-case latencies and slowdowns in varied system architectures.
  • Mitigation strategies such as bank partitioning, bandwidth throttling, and pool-aware scheduling are essential for ensuring predictable, fair, and secure system operation.

Interference in shared memory pools encompasses all performance degradation, predictability loss, and security vulnerabilities stemming from multiple agents or threads concurrently contending for common physical memory resources. Across COTS multicore architectures, heterogeneous SoCs, disaggregated memory fabrics, and virtualized and embedded systems, the phenomenon manifests at multiple system levels: from the granularity of bank or interconnect conflicts, through controller scheduling and queuing, to inter-application competition for bandwidth or latency. Contemporary research unambiguously shows that properly characterizing, bounding, and mitigating this interference is critical for system throughput, fairness, QoS enforcement, predictable real-time computation, and information security.

1. Fundamental Sources and Mechanisms of Interference

Interference in shared memory pools originates in the physical and logical structure of memory subsystems:

  • Bank Contention and Scheduling: DRAM is divided into banks, each supporting a limited number of outstanding requests. Simultaneous access to the same bank by different agents causes queuing and explicit conflicts, especially under open-page policies and FR-FCFS schedulers, which prioritize row-buffer hits but can lead to head-of-line blocking (Yun, 2014).
  • Queuing Structures: Limited-size read/write buffers and request queues at the DRAM controller level determine the number, order, and latency of in-flight requests. Outstanding requests originating from out-of-order execution, speculative loads, and hardware prefetchers—all enabled by a large MSHR pool on COTS platforms—amplify both aggregate throughput and interference (Yun, 2014, Subramanian et al., 2018).
  • Interconnect Arbitration: In multicore/heterogeneous SoCs, all traffic traverses a shared interconnect (e.g., AXI, NoC), where bus or crossbar arbitration not only adds latency but distributes contention non-uniformly based on arbitration policy and address mapping (Carletti et al., 2023, Riedel et al., 2023, Cavalcante et al., 2020).
  • Cache and Bank Partitioning: Absence of efficient way or bank partitioning in LLC and DRAM raises the risk of capacity conflicts and cache eviction storms, propagating interference across a much broader address and temporal footprint (Yun, 2014).
  • Bank Address Mapping and Locality: Address-mapping schemes—such as word/bank interleaving or local block scrambling—control the probability of conflict and the fraction of requests hitting local vs. remote resources under uniform or skewed access patterns (Riedel et al., 2023, Cavalcante et al., 2020).

2. Analytical and Empirical Models of Interference

Quantitative models are essential for interference prediction, bounding, and mitigation design:

  • Worst-Case Queue-Based Bounds: Under partitioned LLC/banks (so no cross-core space contention), with NrqN_{rq} prior reads and NwqN_{wq} pending writes at the controller, the worst-case extra service latency per request is

Dp=NrqtBURST+NwqtRC+tWTRD_p = N_{rq}\cdot t_{BURST} + N_{wq}\cdot t_{RC} + t_{WTR}

where tBURSTt_{BURST} is the data-bus burst length, tRCt_{RC} the DRAM row-cycle time, and tWTRt_{WTR} the bus turn-around penalty. Task-level total interference is then HiDpH_i\cdot D_p, where HiH_i is the number of DRAM requests by task τi\tau_i (Yun, 2014).

  • Slowdown Estimation via Request Service Rate (MISE): Relative performance loss is cast as

Slowdowni=ARSRiSRSRi\text{Slowdown}_i = \frac{\text{ARSR}_i}{\text{SRSR}_i}

where ARSR is the request-service-rate for ii in isolation and SRSR in the shared context. ARSR is sampled by short, periodic high-priority assignment to ii at the memory controller. The model generalizes to non-memory-bound applications with a weighted blend parameterized by the memory-stall fraction αi\alpha_i (Subramanian et al., 2018, Subramanian, 2015).

  • Regression Models for Virtualized Pools: In cloud virtual environments,

I=0.7498T1+0.1598T2+0.1456T3I = 0.7498\,T_1 + 0.1598\,T_2 + 0.1456\,T_3

with T1=TSLLCGSLLCT_1 = T_{SLLC}\cdot G_{SLLC}, T2=TnetGnetT_2 = T_{net}\cdot G_{net}, T3=TDRAMTSLLCGSLLCT_3 = T_{DRAM}\cdot T_{SLLC}\cdot G_{SLLC}; TsT_s is normalized total access to resource ss, GsG_s is a global similarity factor, and all variables are empirically derived (Alves et al., 2016).

  • Auto-tuned Black-Box Estimation: Black-box autotuning approaches empirically maximize slowdowns of representative "victim" tasks by generating parameterized interfering "enemy" processes, yielding conservative lower bounds on interference multipliers for WCET estimation (Iorga et al., 2018).
  • Bandwidth-Sharing and Queuing Models in Disaggregated and Hybrid Systems: Models capturing host-pool bandwidth division, queuing, and burstiness in CXL or rack-scale shared DDR modules predict per-app slowdowns and latency increases, with slowdowns often closely following the ratio Blocal/Bhost(N)B_{local}/B_{host}(N) (Wahlgren et al., 2022, Wahlgren et al., 2023).

3. Experimental Characterization and Key Results

Experimental validation across platforms provides concrete evidence for the theoretical and empirical predictions:

  • COTS Multicore (Intel Xeon W3530, 8 MiB LLC, 16 DRAM banks): Under heavy concurrent write-mostly interference, pointer-chasing benchmarks measured a 3×\approx3\times slowdown in DRAM-limited execution time, while state-of-the-art single-outstanding-request analysis underestimated delays by up to 47%; parallelism-aware analysis delivered safe bounds within 29% of measurements (Yun, 2014).
  • Heterogeneous SoCs (NVIDIA TX2, Xilinx ZU9EG): Worst-case slowdowns were highly pattern- and hardware-dependent. On ZU9EG, write-intensive patterns caused up to 12×12\times slowdown (vs. only 1.3×1.3\times for read-miss), and realistic benchmarks (e.g., 2D stencils) exceeded even these under certain conditions, with single tasks slowed by >60×>60\times when CPUs and FPGA fabric were fully loaded (Carletti et al., 2023).
  • Hybrid DRAM+DCPM System: RDMA write bursts can degrade local multithreaded latency benchmarking throughput (MLC) by >80%>80\% and double local-access latency as queue-depth at the memory controller grows; even modest RDMA concurrency (2-3 queue pairs) has a pronounced effect, necessitating closed-loop rate control to avoid service-level disruption (Oe, 2020).
  • Disaggregated and CXL-Pooled Systems: Multiple co-located hosts on a pooled CXL memory fabric experience slowdowns scaling from <15%<15\% (compute-bound) to 2×2\times (memory-bound) or worse with bursty synchronous traffic. Profiled graph kernels (BFS, PageRank) exhibited chokepoints when remote-access patterns overshot the fabric's bandwidth share (Wahlgren et al., 2022, Wahlgren et al., 2023).
  • Manycore Shared-L1 Clusters (MemPool): Hierarchical interconnects and high over-banking (4 banks/PE) yield average access latency under 6 cycles and less than 2% execution stalls even at 256 PE scale, provided the per-PE offered load remains below 0.35 req/PE/cycle (Riedel et al., 2023).

4. Application Domains: Real-time, HPC, Security, and Virtualization

Interference in shared memory pools constrains or motivates techniques across several application classes:

  • Real-time and Mixed-criticality Systems: Predictable upper bounds on interference are necessary for WCET analysis; mitigation is realized via static resource partitioning (bank/coloring), hardware support for MSHR partitioning, priority-aware DRAM scheduling, and closed-loop feedback controllers (e.g., MemGuard, MISE-QoS) (Yun, 2014, Costa et al., 27 Jan 2025, Subramanian, 2015).
  • High-Performance Computing and Disaggregated Architectures: Contemporary composable memory systems (CXL, rack-scale) experience significant cross-host interference. Roofline and arithmetic-intensity-based schedulers, per-job pooling-aware allocation, and dynamic QoS provisioning address both utilization and fairness (Wahlgren et al., 2022, Wahlgren et al., 2023). Application data placement/scheduling and hardware prefetcher tuning further modulate sensitivity.
  • Security and Covert-Channel Risks: Direct DRAM contention enables covert channels (MC³) in SM-SoCs lacking shared LLCs, with empirical data rates up to $6.4$ kbps (Orin AGX) and bit errors under 1%1\%, demonstrating that CPU/GPU cross-domain data exfiltration can occur with no privileged access (Dagli et al., 6 Dec 2024).
  • Cloud and Virtualization: Accurate regression models incorporating both total resource pressure and co-tenant similarity enable VM placement, admission control, and on-line migration to minimize interference-induced slowdowns and SLA violations (Alves et al., 2016).

5. Methodologies for Bounding and Mitigating Interference

Mitigation tactics center on spatial and temporal isolation, supported by both hardware and software measures:

  • Cache and Bank Partitioning: Assigning exclusive LLC ways and DRAM banks per agent removes cross-eviction and row conflict channels, allowing interference to be modeled solely as queuing at the controller/bus level (Yun, 2014). SP-IMPact offers systematic enumeration and measurement for such configurations (Costa et al., 27 Jan 2025).
  • Bandwidth and Access Throttling: Controllers implement MemGuard/MISE-QoS policies and feedback control loops to cap or proportion request rates per agent or VM, bounding queuing depth and worst-case delays (Subramanian et al., 2018, Costa et al., 27 Jan 2025, Oe, 2020).
  • Admission Control and Pool-Aware Scheduling: Static/dynamic schedulers profile each workload's bandwidth and burstiness, strictly partitioning pool shares and/or job placement to ensure no overcommitment of CXL/fabric or DRAM controller capacity (Wahlgren et al., 2022, Wahlgren et al., 2023).
  • Address Mapping and Locality Management: Hybrid word/group interleaving and address scrambling in large shared scratchpad clusters (MemPool) dramatically reduce long-path and conflict probability, keeping most latency near the minimal pipeline depth (Riedel et al., 2023, Cavalcante et al., 2020).
  • Empirical Tuning and Black-box Analysis: For modern, heterogeneous platforms where analytical modeling is insufficient, auto-tuning frameworks synthesize interference-maximizing “enemy” workloads and measure slowdowns, providing safe lower bounds for WCET and tool-driven configuration recommendation (Iorga et al., 2018).

6. Security, Measurement, and Future Directions

  • Covert Channels and Timing Side-channels: Security implications are acute in multi-domain SoCs and pooled-memory clouds. MC³ demonstrates that even without shared LLC, DRAM contention can leak information at measurable rates, necessitating mitigations such as bank partitioning, randomized MC scheduling, and OS-level noise injection (Dagli et al., 6 Dec 2024).
  • Measurement Infrastructure: Frameworks such as SP-IMPact (embedded systems) and multi-level profiling stacks (rack-scale pooling) facilitate practical tuning, configuration search, and validation of predicted vs empirical worst-case interference for varied workloads (Costa et al., 27 Jan 2025, Wahlgren et al., 2023).
  • Modeling and Analysis Challenges: Current static analysis often fails to capture interacting effects of all shared resources (IOMMU, interrupt controllers, PCIe buses). Research advances towards analytical models accounting for more microarchitectural nuance (MSHR bottlenecks, write buffer, interconnect effects) remain crucial (Costa et al., 27 Jan 2025, Carletti et al., 2023).
  • Implications for Architecture and System Design: Future high-core-count systems, disaggregated fabrics, and security-critical SoCs will benefit from integrating partitioning primitives, flexible software/hardware bandwidth caps, comonadic job scheduling, and on-line interference measurement with formal verification frameworks to balance utilization, predictability, fairness, and security.

7. Summary Table: Key Interference Parameters and Mitigation Actions

Parameter Hardware/Software Domain Mitigation/Bound Mechanism
MSHR/buffer depth COTS mCores, DRAM controller MSHR partitioning, buffer sizing
Bank/way assignment COTS, SoC, SPH systems Bank/LLC partitioning, cache coloring
Outstanding requests CPUs, manycores, GPUs Limit/partition MSHRs, throttle issue width
Request scheduling DRAM, CXL, pooling fabric MISE, MemGuard, controller-level lottery scheduling
Access mapping scheme Manycore SPM, DRAM interleaving Locality-optimized mapping, hybrid/block scrambling
Co-located workload mix Cloud, disaggregated HPC, virtualization Scheduling/admission control via access/similarity

By combining system-aware partitioning, scheduling, empirical tuning, and precise per-resource measurement or estimation, designers can bound and mitigate the often severe performance and security costs of interference in shared memory pools across contemporary and emerging computing systems (Yun, 2014, Subramanian et al., 2018, Costa et al., 27 Jan 2025, Wahlgren et al., 2023, Dagli et al., 6 Dec 2024, Carletti et al., 2023, Alves et al., 2016, Cavalcante et al., 2020, Riedel et al., 2023, Oe, 2020).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Interference in Shared Memory Pools.