Dice Question Streamline Icon: https://streamlinehq.com

Cause of intermittent drops of PREFETCHT1/T2 on AMD Zen4

Determine the architectural and microarchitectural cause for PREFETCHT1 and PREFETCHT2 software prefetch instructions to sometimes fail to fetch data into the cache on AMD EPYC 9124 (Zen4) systems, even when pipeline serialization is enforced, leaving data resident in either CXL-attached memory or DRAM. Establish under what conditions these prefetch hints are dropped on Zen4 and explain the mechanism that leads to this behavior.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper studies x86 software prefetch instructions (T0, T1, T2, W, NTA) on both DRAM and CXL-attached memory across Intel Sapphire Rapids and AMD Zen4 systems. According to vendor documentation, PREFETCHT1/T2 should place data into L2 cache. The authors measure prefetch latency and post-prefetch load latency to verify cache placement.

On AMD Zen4, they observe that even with serialization, PREFETCHT1/T2 sometimes do not fetch data into cache; instead, the data remains in memory (DRAM or CXL), producing higher average load latencies than expected. While successful PREFETCHT1/T2 cases show an expected small latency increase consistent with L2 cache residence, the authors cannot explain why PREFETCHT1/T2 are intermittently dropped.

Understanding this behavior is important for performance tuning of memory-intensive workloads on Zen4, especially when using CXL-attached memory, because software prefetch directives are a common optimization tool and their reliability affects latency and throughput.

References

This result matches what is advertised in AMD's documentation that the new feature of Zen4 is that PREFETCHT1/T2 put data into L2 cache, but we currently do not have a reason why the T1/T2 are sometimes dropped.

The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems (2411.02814 - Wang et al., 5 Nov 2024) in Section "CPU Prefetching on CXL Memory", Subsection "Software Prefetching Instruction"