Cache-miss explanation for PR speedups
Establish whether processing one supernode at a time with a reduced-space data structure in the partition refinement (PR) reordering implementation reduces cache misses and thereby explains the large observed reductions in runtime relative to the original PR implementation of Jacquelin, Ng, and Peyton (2018).
References
We conjecture that working on one supernode at a time, using a data structure that occupies much less space, greatly reduces the number of cache misses during the computation. We conjecture that this probably explains the large reductions in runtimes over those obtained using the original implementation in.
— A comparison of two effective methods for reordering columns within supernodes
(2501.08395 - Karsavuran et al., 14 Jan 2025) in Section 4.3 (Two improvements to the PR method)