Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Memory Dependence Prediction with Static Analysis (2403.08056v3)

Published 12 Mar 2024 in cs.PL and cs.AR

Abstract: This paper explores the potential of communicating information gained by static analysis from compilers to Out-of-Order (OoO) machines, focusing on the memory dependence predictor (MDP). The MDP enables loads to issue without all in-flight store addresses being known, with minimal memory order violations. We use LLVM to find loads with no dependencies and label them via their opcode. These labelled loads skip making lookups into the MDP, improving prediction accuracy by reducing false dependencies. We communicate this information in a minimally intrusive way, i.e.~without introducing additional hardware costs or instruction bandwidth, providing these improvements without any additional overhead in the CPU. We find that in select cases in Spec2017, a significant number of load instructions can skip interacting with the MDP and lead to a performance gain. These results point to greater possibilities for static analysis as a source of near zero cost performance gains in future CPU designs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Reducing design complexity of the load/store queue. In Proc. MICRO-36, pages 411–422, 2003.
  2. Memory dependence prediction using store sets. In Proc. 25th ISCA, pages 142–153, 1998.
  3. Jason Lowe-Power et al. The gem5 Simulator: Version 20.0+. https://arxiv.org/abs/2007.03152, 2020.
  4. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. CGO’04, 2004.
  5. Chains of Recurrences—a Method to Expedite the Evaluation of Closed-Form Functions. In Proc. ISSAC ’94, page 242–249, 1994.
  6. D. Novillo and R. H. Canada. Memory SSA - A Unified Approach for Sparsely Representing Memory Operations. In Proc of the GCC Developers’ Summit, 2007.
  7. Practical Dependence Testing. PLDI ’91, page 15–29, 1991.
  8. Using SimPoint for Accurate and Efficient Simulation. SIGMETRICS Perform. Eval. Rev., 31(1):318–319, Jun 2003.
  9. Valgrind. https://valgrind.org/.
  10. Flang Spec2017 Compilation Status. https://github.com/flang-compiler/f18-llvm-project/issues/1476.
  11. Efficient Vector Store System for Python using Shared Memory. In Proc. AIMLSystems ’22, 2023.
  12. Otto López. Memory Dependence Prediction Methods Study and Improvement Proposals. Master’s thesis, Universitat Politècnica de Catalunya, March 2011.
  13. Cost effective speculation with the omnipredictor. pages 1–13, 11 2018.
  14. Effective Context-Sensitive Memory Dependence Prediction. In 30th Symposium on High Performance Computer Architecture (HPCA), Edinburgh, Scotland, March 2024. IEEE Computer Society.
  15. Software-hardware cooperative memory disambiguation. In Proc. HPCA, 2006, pages 244–253, 2006.
  16. Feedback-Directed Memory Disambiguation through Store Distance Analysis. In Proc. ICS ’06, 2006.
  17. MLIR Affine Dialect. https://mlir.llvm.org/docs/Dialects/Affine/.

Summary

We haven't generated a summary for this paper yet.