Dice Question Streamline Icon: https://streamlinehq.com

Exact Instruction Fetch Scheduler Policy in Modern NVIDIA GPUs

Determine the exact instruction fetch scheduler policy used by NVIDIA Ampere Streaming Multiprocessor sub-cores, including the precise rules for warp selection and switching, how Instruction Buffer occupancy constrains fetch, and how the fetch scheduler coordinates with the issue scheduler.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper reverse-engineers multiple stages of the NVIDIA GPU core pipeline and proposes a plausible fetch scheduler that mirrors the observed greedy behavior of the issue scheduler. Due to limited observability, the authors assume a three-entry per-warp Instruction Buffer and a fetch-one-instruction-per-cycle capability.

Despite extensive microbenchmarking, the authors state they could not confirm the exact fetch policy and therefore model a reasonable design based on empirical behavior to avoid frequent Instruction Buffer emptiness, which was not observed in experiments.

References

We could not confirm the exact instruction fetch policy with our experiments, but it has to be similar to the issue policy; otherwise, the condition of not finding a valid instruction in the Instruction Buffer would happen relatively often, and we have not observed this in our experiments.

Analyzing Modern NVIDIA GPU cores (2503.20481 - Huerta et al., 26 Mar 2025) in Section 5.2 (Front-end)