Papers
Topics
Authors
Recent
Search
2000 character limit reached

SRCP: Reuse-Aware Cache Partitioning

Updated 7 May 2026
  • Reuse-Aware Cache Partitioning (SRCP) is a hardware cache replacement strategy for multicore processors that uses static way-partitioning and per-block metrics to manage shared data.
  • It employs an eviction policy based on a lexicographically ordered tuple of Access Frequency Count, Global Count, and LRU age to minimize inter-core interference.
  • Empirical evaluations show SRCP improves cache hit rates by up to 13.34% and performance speedup by up to 10.4% compared to traditional replacement schemes like LRU.

Reuse-Aware Cache Partitioning (SRCP) is a hardware cache replacement strategy for multicore systems, designed to enhance performance and predictability in the presence of data sharing across threads. SRCP operates atop static way-partitioning of the shared last-level cache (LLC), introducing mechanisms that track per-block reuse and data-sharing characteristics to minimize inter-core interference, avoid redundant replications, and preserve high-locality data. Empirical evaluation shows SRCP yields substantial benefits over leading cache management schemes such as TA-DRRIP and EHC, with up to 13.34% improvement in cache hit-rate and up to 10.4% performance speedup over LRU in multithreaded workloads (Ghosh et al., 2022).

1. Multicore Cache Partitioning Model and Notation

SRCP targets multicore architectures featuring an N-core processor with a shared LLC of C bytes, associativity W, S index sets, and cache lines of L bytes. Static way-partitioning allocates each core ii a private region of Wi=W/NW_i = W / N ways per set, where each set is divided into N distinct partitions. Each partition ii accommodates up to WiW_i blocks per set, forming the set AiA_i of locally managed blocks. Blocks are globally accessible for data sharing; however, eviction by a core is restricted solely to its own home partition.

The system employs several per-block metadata fields:

  • LC(b){0,1}LC(b) \in \{0,1\}: Local-core access flag
  • AFC(b)[0,2k1]AFC(b) \in [0, 2^k-1]: Access Frequency Count (typically k=8k=8)
  • GCount(b)[0,N1]GCount(b) \in [0, N-1]: Global Count of accesses by non-local cores (log2_2N bits)

SRCP interprets Wi=W/NW_i = W / N0 as a direct proxy for a block’s reuse frequency, while Wi=W/NW_i = W / N1 quantifies inter-core sharing, providing the basis for runtime distinction between private and shared data.

2. SRCP Replacement Policy and Metadata Management

SRCP replaces cache blocks within each partition based on a lexicographically ordered three-tuple metric:

Wi=W/NW_i = W / N2

where eviction preference is given to blocks with the smallest Wi=W/NW_i = W / N3; ties break on Wi=W/NW_i = W / N4, and finally on LRU age.

On every access to block Wi=W/NW_i = W / N5:

  • If accessed by its partition’s local core, Wi=W/NW_i = W / N6 and Wi=W/NW_i = W / N7 is incremented (clamped at Wi=W/NW_i = W / N8).
  • If accessed by a non-local core, Wi=W/NW_i = W / N9 is incremented (clamped at ii0).
  • LRU metadata is updated accordingly.

Upon a miss (fill) in partition ii1:

  • All ii2 and ii3 fields for blocks in ii4 are decremented by 1 (not below 0).
  • The candidate with minimum ii5 is selected and evicted.
  • The new block is initialized: ii6, where ii7 is the current mean range of ii8 values in ii9; WiW_i0; WiW_i1.

This approach integrates reuse and sharing directly into the replacement decision, in contrast to traditional recency-only or adaptive policies.

3. Data Sharing Detection and Redundant Replication Mitigation

A block WiW_i2 is identified as “shared” when WiW_i3; otherwise, it is considered “private.” The tracking mechanism is fully hardware-based, requiring no profiling or programmer annotations. Global accesses (reads/writes by non-local cores) increment WiW_i4, dynamically reflecting inter-thread communication patterns.

To prevent redundant replication, SRCP’s way-partitioned eviction and fill policies ensure that only one copy of a block exists per set in the LLC. Non-local threads access and potentially evict blocks from remote partitions, but replication is not permitted across partitions. Shared data is thus available for inter-core accesses without polluting multiple partitions, optimizing both space and data locality.

4. Quantitative Evaluation and Performance Results

Experiments are conducted in gem5 full-system simulation with 4-core, 2 GHz configurations utilizing 32 KB L1 (private) and a shared 8 MB, 16-way LLC, running PARSEC and Splash-2 multithreaded benchmarks. SRCP’s metrics are reported against LRU, TA-DRRIP, and EHC replacement baselines.

Performance outcomes are summarized as follows:

Scheme Hit-rate Improvement (PARSEC) Performance Speedup (PARSEC & Splash-2)
SRCP up to +13.34% up to +10.4%
EHC +9.4% +6.2%
TA-DRRIP +7.3% +5.0%
LRU

SRCP consistently outperforms both state-of-the-art (TA-DRRIP, EHC) and traditional (LRU) cache replacement schemes with respect to both hit-rate and IPC, especially in data-sharing-intensive workloads. The observed benefit stems from SRCP’s ability to both suppress unnecessary replication of shared data and to preserve highly reused private and shared blocks (Ghosh et al., 2022).

5. Complexity, Hardware Overheads, and Integration

SRCP adds modest metadata: 1 bit for WiW_i5, WiW_i6 bits for WiW_i7 (WiW_i8 preferred), and WiW_i9 bits (AiA_i0) for AiA_i1 per cache line. For AiA_i2, this totals 11 bits/line. Per-access update operations are AiA_i3; per-miss actions are AiA_i4, with direct scans over each partition’s AiA_i5 ways for victim selection and counter decrementation.

Hardware support involves minor extensions to cache tag arrays and additional logic in the controller for increment/decrement and comparator functions. No additional software infrastructure is required aside from initial partitioning configuration (e.g., via machine-specific registers at boot).

Deployment recommendations include calibrating AiA_i6 for area-performance trade-off, amortizing per-miss counter decrementation (e.g., over multiple misses), and hand-tuning the initial AiA_i7 values to align with core working set characteristics.

6. Comparison with Existing Cache Replacement Techniques

SRCP advances upon previous methods by explicitly integrating sharing and reuse into the eviction metric:

  • LRU: no visibility into data sharing or reuse; maximal susceptibility to destructive interference.
  • TA-DRRIP: adaptive RRIP score for recency, agnostic to inter-core sharing.
  • EHC: tracks recency history but not explicit data-sharing.
  • SRCP: introduces AiA_i8 and AiA_i9 for nuanced, per-block awareness of both local reuse and inter-core sharing, enforced by way-partitioning for strong isolation.

The measured impact is an additional 4–6 percentage points in hit-rate and 4–5 percentage points in IPC over leading alternatives under data-sharing workloads.

7. Practical Deployment and Future Implications

SRCP is architected for transparent hardware deployment, requiring only static LLC way partitioning and lightweight metadata. No modifications to application code, operating system, or compilers are necessary post-partitioning. The hardware footprint is minimal, and the O(1) per-access control logic is consistent with state-of-the-art predictor architectures.

A plausible implication is that SRCP’s policy model can be extended to dynamic partitioning schemes and integrated with further QoS and predictability mechanisms for real-time and mixed-criticality systems. Its explicit support for data sharing is suited for modern, heavily parallel multithreaded applications, providing a principled and efficient approach to both minimizing interference and maximizing system throughput under shared-memory multiprocessors (Ghosh et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reuse-Aware Cache Partitioning (SRCP).