Explain the interaction causing x264 performance behavior across CPU sizes
Determine a full mechanistic explanation of the interaction between instruction window scaling and Store Sets memory dependence predictor table size (SSIT and LFST) that produces larger performance gains for the Spec2017 benchmark 625.x264_s on the large Gem5 CPU model but not on the small or extra-large models when using LLVM-emitted "Predict No Dependency" (PND) load opcodes, specifically characterizing how additional behavior captured by larger instruction windows relates to index collisions in the Store Sets predictor.
References
These additional gains are then lost again in the extra-large model, however in this case we isolate the cause as once again just the MDP size. This implies whatever additional behaviour that begins to be captured in the large model is still related to index collisions in some way, however we are currently unable to establish a full explanation of the interaction at play here.