Dice Question Streamline Icon: https://streamlinehq.com

Cause of Constant Cache Latency Anomalies

Explain and validate the architectural cause for the observation that, on NVIDIA Ampere GPUs, constant cache accesses exhibit significantly greater WAR latency than global memory loads while showing slightly lower RAW/WAW latency, and reconcile these behaviors with cache hierarchy usage for fixed-latency versus LDC-based accesses.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper reports measured latencies indicating that constant cache WAR latency is unexpectedly higher than global memory loads, whereas RAW/WAW latency is slightly lower.

Although the authors determine that fixed-latency instructions use an L0 "FL" constant cache while LDC uses an L0 "VL" constant cache, they state they could not confirm any hypothesis that explains the WAR/RAW-WAW anomaly.

References

We could not confirm any hypothesis that explains this observation.

Analyzing Modern NVIDIA GPU cores (2503.20481 - Huerta et al., 26 Mar 2025) in Section 5.4 (Memory Pipeline)