Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures (2403.16063v1)
Abstract: Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are available, whereas information for AMD's designs are lacking. The reason for this disparity is that standard techniques to infer exact port mappings require hardware performance counters that AMD does not provide. In this work, we modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters. The modifications are based on a formal port mapping model with a counter-example-guided algorithm powered by an SMT solver. We investigate in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping. Our results provide valuable insights for creators of CPU performance models as well as for software developers who want to achieve peak performance on recent AMD CPUs.
- uops.info: Characterizing latency, throughput, and port usage of instructions on intel microarchitectures. In Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck, editors, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019, pages 673–686. ACM, 2019.
- nanobench: A low-overhead tool for running microbenchmarks on x86 systems. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 34–46. IEEE, 2020.
- uiCA: Accurate throughput prediction of basic blocks on recent intel microarchitectures. In Lawrence Rauchwerger, Kirk W. Cameron, Dimitrios S. Nikolopoulos, and Dionisios N. Pnevmatikatos, editors, ICS ’22: 2022 International Conference on Supercomputing, Virtual Event, June 28 - 30, 2022, pages 33:1–33:14. ACM, 2022.
- Facile: Fast, accurate, and interpretable basic-block throughput prediction. In IEEE International Symposium on Workload Characterization, IISWC 2023, Ghent, Belgium, October 1-3, 2023, pages 87–99. IEEE, 2023.
- AMD. Software Optimization Guide for AMD Family 17h Processors. 2017.
- AMD. Processor Programming Reference for AMD Family 17h Models 01h,08h, Revision B2 Processors. 2019.
- AMD. Processor Programming Reference (PPR) for AMD Family 19h Model 21h, Revision B0 Processors. 2021.
- ARM. ARM Neoverse V2 core software optimization guide. https://developer.arm.com/documentation/PJDOC-466751330-593177/r0p2/, 2022. Revision r0p2, Accessed: March 1, 2024.
- ARM. ARM Neoverse V2 core technical reference manual. https://developer.arm.com/documentation/102375/0002/, 2022. Revision r0p2, Accessed: March 1, 2024.
- Andrea Di Biagio. llvm-mca: A static performance analysis tool. https://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html, 2018. Accessed: 2023-08-22.
- Spec cpu2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ICPE ’18, pages 41–42. ACM, 2018.
- Counterexample-guided abstraction refinement. In E. Allen Emerson and A. Prasad Sistla, editors, Computer Aided Verification, 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19, 2000, Proceedings, volume 1855 of Lecture Notes in Computer Science, pages 154–169. Springer, 2000.
- Z3: an efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008.
- PALMED: throughput characterization for superscalar architectures. In Jae W. Lee, Sebastian Hack, and Tatiana Shpeisman, editors, IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2022, Seoul, Korea, Republic of, April 2-6, 2022, pages 106–117. IEEE, 2022.
- Agner Fog. The microarchitecture of Intel, AMD, and VIA CPUs. https://www.agner.org/optimize/microarchitecture.pdf, 2022. Accessed: 2023-08-22.
- Agner Fog. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. https://www.agner.org/optimize/instruction_tables.pdf, 2023. Accessed: 2023-08-22.
- Fujitsu. A64FX microarchitecture manual. https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.8.1.pdf, 2022. Version 1.8.1, Accessed: March 1, 2024.
- Fujitsu. A64FX pmu events. https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.3.pdf, 2022. Version 1.3, Accessed: March 1, 2024.
- GCC. GCC - scheduling model for AMD Zen microarchitectures. https://github.com/gcc-mirror/gcc/blob/15b83b69ca99d97643075776ba94f2dd1f05b46e/gcc/config/i386/znver.md, 2023. Accessed: 2023-08-22.
- LIKWID, November 2023. https://doi.org/10.5281/zenodo.10105559.
- Computer Architecture: A Quantitative Approach – 6th Edition. Elsevier, 2017.
- Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual. 2023.
- Intel. Performance monitoring events for 12th and 13th generation intel core processors. https://perfmon-events.intel.com/ahybrid.html, 2024. Accessed: March 1, 2024.
- Dougall Johnson. Apple M1 microarchitecture research. https://dougallj.github.io/applecpu/firestorm.html, 2021. Accessed: 2023-08-22.
- Automated instruction stream throughput prediction for Intel and AMD microarchitectures. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 121–131. IEEE, 2018.
- LLVM. LLVM - scheduling model for AMD Zen microarchitectures. https://github.com/llvm/llvm-project/blob/4eb1f1fab35d0f386b458bf1da4396bbeb00b04f/llvm/lib/Target/X86/X86ScheduleZnver1.td, 2023. Accessed: 2023-08-22.
- LLVM. llvm-mca - LLVM machine code analyzer (command guide). https://llvm.org/docs/CommandGuide/llvm-mca.html, 2023. Accessed: 2023-08-22.
- Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4505–4515, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
- PMEvo: Portable inference of port mappings for out-of-order processors by evolutionary optimization. In Alastair F. Donaldson and Emina Torlak, editors, Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, pages 608–622. ACM, 2020.
- CQA: A code quality analyzer tool at binary level. In 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014, pages 1–10. IEEE Computer Society, 2014.
- Combinatorial sketching for finite programs. In John Paul Shen and Margaret Martonosi, editors, Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006, pages 404–415. ACM, 2006.
- GRANITE: A graph neural network model for basic block throughput estimation. pages 14–26, 2022.
- Robert M Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1):25–33, 1967.
- LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Wang-Chien Lee and Xin Yuan, editors, 39th International Conference on Parallel Processing, ICPP Workshops 2010, San Diego, California, USA, 13-16 September 2010, pages 207–216. IEEE Computer Society, 2010.
- WikiChip. Zen - microarchitectures - AMD. https://en.wikichip.org/wiki/amd/microarchitectures/zen, 2023. Accessed: 2023-08-22.
- WikiChip. Zen+ - microarchitectures - AMD. https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B, 2023. Accessed: 2023-08-22.