SRCP: Reuse-Aware Cache Partitioning

Updated 7 May 2026

Reuse-Aware Cache Partitioning (SRCP) is a hardware cache replacement strategy for multicore processors that uses static way-partitioning and per-block metrics to manage shared data.
It employs an eviction policy based on a lexicographically ordered tuple of Access Frequency Count, Global Count, and LRU age to minimize inter-core interference.
Empirical evaluations show SRCP improves cache hit rates by up to 13.34% and performance speedup by up to 10.4% compared to traditional replacement schemes like LRU.

Reuse-Aware Cache Partitioning (SRCP) is a hardware cache replacement strategy for multicore systems, designed to enhance performance and predictability in the presence of data sharing across threads. SRCP operates atop static way-partitioning of the shared last-level cache (LLC), introducing mechanisms that track per-block reuse and data-sharing characteristics to minimize inter-core interference, avoid redundant replications, and preserve high-locality data. Empirical evaluation shows SRCP yields substantial benefits over leading cache management schemes such as TA-DRRIP and EHC, with up to 13.34% improvement in cache hit-rate and up to 10.4% performance speedup over LRU in multithreaded workloads (Ghosh et al., 2022).

1. Multicore Cache Partitioning Model and Notation

SRCP targets multicore architectures featuring an N-core processor with a shared LLC of C bytes, associativity W, S index sets, and cache lines of L bytes. Static way-partitioning allocates each core $i$ a private region of $W_i = W / N$ ways per set, where each set is divided into N distinct partitions. Each partition $i$ accommodates up to $W_i$ blocks per set, forming the set $A_i$ of locally managed blocks. Blocks are globally accessible for data sharing; however, eviction by a core is restricted solely to its own home partition.

The system employs several per-block metadata fields:

$LC(b) \in \{0,1\}$ : Local-core access flag
$AFC(b) \in [0, 2^k-1]$ : Access Frequency Count (typically $k=8$ )
$GCount(b) \in [0, N-1]$ : Global Count of accesses by non-local cores (log $_2$ N bits)

SRCP interprets $W_i = W / N$ 0 as a direct proxy for a block’s reuse frequency, while $W_i = W / N$ 1 quantifies inter-core sharing, providing the basis for runtime distinction between private and shared data.

2. SRCP Replacement Policy and Metadata Management

SRCP replaces cache blocks within each partition based on a lexicographically ordered three-tuple metric:

$W_i = W / N$ 2

where eviction preference is given to blocks with the smallest $W_i = W / N$ 3; ties break on $W_i = W / N$ 4, and finally on LRU age.

On every access to block $W_i = W / N$ 5:

If accessed by its partition’s local core, $W_i = W / N$ 6 and $W_i = W / N$ 7 is incremented (clamped at $W_i = W / N$ 8).
If accessed by a non-local core, $W_i = W / N$ 9 is incremented (clamped at $i$ 0).
LRU metadata is updated accordingly.

Upon a miss (fill) in partition $i$ 1:

All $i$ 2 and $i$ 3 fields for blocks in $i$ 4 are decremented by 1 (not below 0).
The candidate with minimum $i$ 5 is selected and evicted.
The new block is initialized: $i$ 6, where $i$ 7 is the current mean range of $i$ 8 values in $i$ 9; $W_i$ 0; $W_i$ 1.

This approach integrates reuse and sharing directly into the replacement decision, in contrast to traditional recency-only or adaptive policies.

A block $W_i$ 2 is identified as “shared” when $W_i$ 3; otherwise, it is considered “private.” The tracking mechanism is fully hardware-based, requiring no profiling or programmer annotations. Global accesses (reads/writes by non-local cores) increment $W_i$ 4, dynamically reflecting inter-thread communication patterns.

To prevent redundant replication, SRCP’s way-partitioned eviction and fill policies ensure that only one copy of a block exists per set in the LLC. Non-local threads access and potentially evict blocks from remote partitions, but replication is not permitted across partitions. Shared data is thus available for inter-core accesses without polluting multiple partitions, optimizing both space and data locality.

4. Quantitative Evaluation and Performance Results

Experiments are conducted in gem5 full-system simulation with 4-core, 2 GHz configurations utilizing 32 KB L1 (private) and a shared 8 MB, 16-way LLC, running PARSEC and Splash-2 multithreaded benchmarks. SRCP’s metrics are reported against LRU, TA-DRRIP, and EHC replacement baselines.

Performance outcomes are summarized as follows:

Scheme	Hit-rate Improvement (PARSEC)	Performance Speedup (PARSEC & Splash-2)
SRCP	up to +13.34%	up to +10.4%
EHC	+9.4%	+6.2%
TA-DRRIP	+7.3%	+5.0%
LRU	—	—

SRCP consistently outperforms both state-of-the-art (TA-DRRIP, EHC) and traditional (LRU) cache replacement schemes with respect to both hit-rate and IPC, especially in data-sharing-intensive workloads. The observed benefit stems from SRCP’s ability to both suppress unnecessary replication of shared data and to preserve highly reused private and shared blocks (Ghosh et al., 2022).

5. Complexity, Hardware Overheads, and Integration

SRCP adds modest metadata: 1 bit for $W_i$ 5, $W_i$ 6 bits for $W_i$ 7 ( $W_i$ 8 preferred), and $W_i$ 9 bits ( $A_i$ 0) for $A_i$ 1 per cache line. For $A_i$ 2, this totals 11 bits/line. Per-access update operations are $A_i$ 3; per-miss actions are $A_i$ 4, with direct scans over each partition’s $A_i$ 5 ways for victim selection and counter decrementation.

Hardware support involves minor extensions to cache tag arrays and additional logic in the controller for increment/decrement and comparator functions. No additional software infrastructure is required aside from initial partitioning configuration (e.g., via machine-specific registers at boot).

Deployment recommendations include calibrating $A_i$ 6 for area-performance trade-off, amortizing per-miss counter decrementation (e.g., over multiple misses), and hand-tuning the initial $A_i$ 7 values to align with core working set characteristics.

6. Comparison with Existing Cache Replacement Techniques

SRCP advances upon previous methods by explicitly integrating sharing and reuse into the eviction metric:

LRU: no visibility into data sharing or reuse; maximal susceptibility to destructive interference.
TA-DRRIP: adaptive RRIP score for recency, agnostic to inter-core sharing.
EHC: tracks recency history but not explicit data-sharing.
SRCP: introduces $A_i$ 8 and $A_i$ 9 for nuanced, per-block awareness of both local reuse and inter-core sharing, enforced by way-partitioning for strong isolation.

The measured impact is an additional 4–6 percentage points in hit-rate and 4–5 percentage points in IPC over leading alternatives under data-sharing workloads.

7. Practical Deployment and Future Implications

SRCP is architected for transparent hardware deployment, requiring only static LLC way partitioning and lightweight metadata. No modifications to application code, operating system, or compilers are necessary post-partitioning. The hardware footprint is minimal, and the O(1) per-access control logic is consistent with state-of-the-art predictor architectures.

A plausible implication is that SRCP’s policy model can be extended to dynamic partitioning schemes and integrated with further QoS and predictability mechanisms for real-time and mixed-criticality systems. Its explicit support for data sharing is suited for modern, heavily parallel multithreaded applications, providing a principled and efficient approach to both minimizing interference and maximizing system throughput under shared-memory multiprocessors (Ghosh et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Reuse-Aware Cache Partitioning Framework for Data-Sharing Multicore Systems (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reuse-Aware Cache Partitioning (SRCP).