Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers (2402.14567v1)

Published 22 Feb 2024 in cs.PF

Abstract: A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled solution to evaluate code analyzers on C-level benchmarks composed of a benchmark derivation procedure that feeds an evaluation harness. We conclude that memory-carried data dependencies are a major source of imprecision for these tools. We tackle this issue with staticdeps, a static analyzer extracting memory-carried data dependencies, including across loop iterations, from an assembly basic block. We integrate its output to uiCA, a state-of-the-art code analyzer, to evaluate staticdeps' impact on a code analyzer's precision through CesASMe.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Andreas Abel and Jan Reineke. 2019a. nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems. arXiv e-prints abs/1911.03282 (2019). arXiv:1911.03282 http://arxiv.org/abs/1911.03282
  2. Andreas Abel and Jan Reineke. 2019b. uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures. In ASPLOS (Providence, RI, USA) (ASPLOS ’19). ACM, New York, NY, USA, 673–686. https://doi.org/10.1145/3297858.3304062
  3. Andreas Abel and Jan Reineke. 2022. UiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures. In Proceedings of the 36th ACM International Conference on Supercomputing (Virtual Event) (ICS ’22). Association for Computing Machinery, New York, NY, USA, Article 33, 14 pages. https://doi.org/10.1145/3524059.3532396
  4. AMD 2023. AMD64 Architecture Programmer’s Manual, volume 2. AMD.
  5. Adding Virtualization Capabilities to the Grid’5000 Testbed. In Cloud Computing and Services Science, Ivan I. Ivanov, Marten van Sinderen, Frank Leymann, and Tony Shan (Eds.). Communications in Computer and Information Science, Vol. 367. Springer International Publishing, 3–20. https://doi.org/10.1007/978-3-319-04519-1_1
  6. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (aug 2011), 1–7. https://doi.org/10.1145/2024716.2024718
  7. PLuTo: A Practical and Fully Automatic Polyhedral Parallelizer and Locality Optimizer. Technical Report OSU-CISRC-10/07-TR70. The Ohio State University.
  8. BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 167–177. https://doi.org/10.1109/IISWC47752.2019.9042166
  9. PALMED: Throughput Characterization for Superscalar Architectures. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 106–117. https://doi.org/10.1109/CGO53902.2022.9741289
  10. Fabian Gruber. 2019. Performance Debugging Toolbox for Binaries: Sensitivity Analysis and Dependence Profiling. Ph. D. Dissertation. Université Grenoble Alpes. http://www.theses.fr/2019GREAM071 2019GREAM071.
  11. Intel Corporation. [n. d.]. Intel Architecture Code Analyzer (IACA). https://software.intel.com/en-us/articles/intel-architecture-code-analyzer/.
  12. Intel Corporation 2023. Intel® 64 and IA-32 Architectures Software Developer’s Manual, volume 1. Intel Corporation.
  13. Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
  14. Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels. In 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 1–6. https://doi.org/10.1109/PMBS49563.2019.00006
  15. Linux Kernel. [n. d.]. perf: Linux profiling with performance counters. http://perf.wiki.kernel.org/index.php/Main_Page.
  16. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks. CoRR abs/1808.07412 (2018). arXiv:1808.07412 http://arxiv.org/abs/1808.07412
  17. Nicholas Nethercote and Julian Seward. 2003. Valgrind: A Program Supervision Framework. Electr. Notes Theor. Comput. Sci. 89, 2 (2003), 44–66.
  18. Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 428–439. https://doi.org/10.1109/HPCA.2014.6835952
  19. PoCC [n. d.]. PoCC, the Polyhedral Compiler Collection. https://www.cs.colostate.edu/~pouchet/software/pocc/.
  20. Louis-Noël Pouchet and Tomofumi Yuki. 2016. PolyBench/C: The polyhedral benchmark suite, version 4.2. http://polybench.sf.net.
  21. Nguyen Anh Quynh and the Capstone collaborators. [n. d.]. Capstone engine. https://www.capstone-engine.org/.
  22. Fabian Ritter and Sebastian Hack. 2022. AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers. Proc. ACM Program. Lang. 6, OOPSLA2, Article 125 (oct 2022), 29 pages. https://doi.org/10.1145/3563288
  23. Load Value Prediction via Path-based Address Prediction: Avoiding Mispredictions due to Conflicting Stores. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 423–435.
  24. Sony Corporation and LLVM Project. [n. d.]. LLVM Machine Code Analyzer. https://llvm.org/docs/CommandGuide/llvm-mca.html.
  25. R. M. Tomasulo. 1967. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development 11, 1 (1967), 25–33. https://doi.org/10.1147/rd.111.0025
  26. WikiChip. 2021. Intel Details Golden Cove: Next-Generation Big Core For Client and Server SoCs. https://fuse.wikichip.org/news/6111/intel-details-golden-cove-next-generation-big-core-for-client-and-server-socs/.

Summary

We haven't generated a summary for this paper yet.