Dice Question Streamline Icon: https://streamlinehq.com

Clarify DSU L3D counter behavior under write-streaming

Ascertain whether the comparable values observed for the Arm DynamIQ Shared Unit (DSU) L3D performance counter when write-streaming mode is enabled versus disabled on Cortex-A55–based Rockchip RK3568 and RK3588 systems are caused by (i) a defect or undocumented behavior in the Cortex-A55 write-streaming mechanism that still allocates or accounts traffic in the L3 cache, or (ii) a mis-specification or implementation issue in the DSU L3D performance counter; and rigorously characterize the correct event-counting semantics of DSU L3D in write-streaming mode.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper evaluates cache partitioning on Arm DSU platforms and analyzes performance counters to explain observed slowdowns under various inter-core interference patterns. On Cortex-A55 cores, write-streaming is expected to bypass the L3 cache after sustained streaming writes, implying that L3 allocations and certain L3 events should not be counted.

However, experiments show that the DSU L3D performance counter reports comparable values regardless of whether write-streaming is enabled or disabled, contradicting the expectation that L3D would not count in streaming mode. This raises uncertainty about whether the inconsistency arises from write-streaming behavior or from the counter’s implementation or semantics, which must be resolved to enable reliable measurement-driven real-time analysis and cache-partitioning policies.

References

This counters the expectation that L3D does not count in write-streaming mode as the data accessed are not being allocated to the cache. We are not sure if this is an error in the behavior of write-streaming mode or counting implementation of L3D counter.

Arm DynamIQ Shared Unit and Real-Time: An Empirical Evaluation (2503.17038 - Pradhan et al., 21 Mar 2025) in Observation “Platform Specific Observations,” Section 5 (Experiments and Discussion)