Exact WAR Latencies for Uniform Global Stores (64-bit and 128-bit)
Measure and establish the precise WAR latency values for global store instructions using uniform registers at 64-bit and 128-bit granularity on NVIDIA Ampere GPUs, replacing the current approximations used in the paper.
References
Values with ${*}$ are approximations as we were unable to gather these data.
— Analyzing Modern NVIDIA GPU cores
(2503.20481 - Huerta et al., 26 Mar 2025) in Section 5.4 (Memory Pipeline), Table “Memory instructions latencies in cycles”