Exact Register File Read Port Allocation and Bubble-Insertion Policy

Determine the precise policy governing register file read port allocation and bubble insertion for NVIDIA Ampere GPUs, including how operand roles and instruction types (e.g., FFMA, FADD, FMUL) influence port conflicts and stall behavior.

Background

The authors measure a 1024-bit per-bank read bandwidth and show that stalls depend on operand bank placement and instruction type, but cannot derive a single read policy that matches all cases.

They propose an approximate model with Allocate staging and fixed three-cycle operand reads for fixed-latency instructions, highlighting the need to resolve the underlying policy precisely.

References

Unfortunately, we could not find a read policy that matches all the cases we have studied, as we observed that the generation of bubbles depends on the type of instructions and the task of each operand in the instructions.

Analyzing Modern NVIDIA GPU cores (2503.20481 - Huerta et al., 26 Mar 2025) in Section 5.3 (Register File)