Dice Question Streamline Icon: https://streamlinehq.com

Exact WAR Latencies for Uniform Global Stores (64-bit and 128-bit)

Measure and establish the precise WAR latency values for global store instructions using uniform registers at 64-bit and 128-bit granularity on NVIDIA Ampere GPUs, replacing the current approximations used in the paper.

Information Square Streamline Icon: https://streamlinehq.com

Background

In the paper’s latency table, WAR latencies for uniform 64-bit and 128-bit global store instructions are marked as approximate due to lack of definitive measurements.

These missing measurements prevent a fully accurate latency model and motivate targeted microbenchmarking to obtain exact values.

References

Values with ${*}$ are approximations as we were unable to gather these data.

Analyzing Modern NVIDIA GPU cores (2503.20481 - Huerta et al., 26 Mar 2025) in Section 5.4 (Memory Pipeline), Table “Memory instructions latencies in cycles”