Cause of intermittent drops of PREFETCHT1/T2 on AMD Zen4
Determine the architectural and microarchitectural cause for PREFETCHT1 and PREFETCHT2 software prefetch instructions to sometimes fail to fetch data into the cache on AMD EPYC 9124 (Zen4) systems, even when pipeline serialization is enforced, leaving data resident in either CXL-attached memory or DRAM. Establish under what conditions these prefetch hints are dropped on Zen4 and explain the mechanism that leads to this behavior.
Sponsor
References
This result matches what is advertised in AMD's documentation that the new feature of Zen4 is that PREFETCHT1/T2 put data into L2 cache, but we currently do not have a reason why the T1/T2 are sometimes dropped.
— The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems
(2411.02814 - Wang et al., 5 Nov 2024) in Section "CPU Prefetching on CXL Memory", Subsection "Software Prefetching Instruction"