Unexplained ORHR_COL slowdown on AMD CPUs
Identify and explain the causes of the observed slow performance of LAPACK’s ORHR_COL routine on the AMD EPYC 9734 platform in the reported experiments, determining whether the bottleneck arises from the ORHR_COL algorithm itself, the vendor library implementation used (Intel MKL on AMD hardware), or hardware‑specific factors.
References
We do not have an explanation for the slow performance of ORHR_COL on the AMD system.
— Anatomy of High-Performance Column-Pivoted QR Decomposition
(2507.00976 - Melnichenko et al., 1 Jul 2025) in Section 5 (CPU performance breakdown, after Figure 6)