Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang (2310.01882v1)

Published 3 Oct 2023 in cs.DC and cs.PL

Abstract: MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage transformations between dialects provides opportunities for automated program optimisation and parallelisation. In addition to general purpose compilers built upon MLIR, domain specific abstractions have also been developed. In this paper we explore complimenting the Flang MLIR general purpose compiler by combining with the domain specific Open Earth Compiler's MLIR stencil dialect. Developing transformations to discover and extracts stencils from Fortran, this specialisation delivers between a 2 and 10 times performance improvement for our benchmarks on a Cray supercomputer compared to using Flang alone. Furthermore, by leveraging existing MLIR transformations we develop an auto-parallelisation approach targeting multi-threaded and distributed memory parallelism, and optimised execution on GPUs, without any modifications to the serial Fortran source code.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Bridging control-centric and data-centric optimization. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. 173–185.
  2. Cilk: An efficient multithreaded runtime system. ACM SigPlan Notices 30, 8 (1995), 207–216.
  3. A highly scalable Met Office NERC Cloud model. arXiv preprint arXiv:2009.12849 (2020).
  4. The CLAW DSL: Abstractions for performance portable weather and climate models. In Proceedings of the Platform for Advanced Scientific Computing Conference. 1–10.
  5. StencilFlow: Mapping large stencil programs to distributed spatial computing systems. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 315–326.
  6. Raúl de la Cruz and Mauricio Araya-Polo. 2015. Modeling stencil computations on modern HPC architectures. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation: 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers 5. Springer, 149–171.
  7. Code Generation for In-Place Stencils. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. 2–13.
  8. IRDL: an IR definition language for SSA compilers. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 199–212.
  9. FIR 2023. FIR Language Reference. Retrieved Aug 16, 2023 from https://flang.llvm.org/docs/FIRLangRef.html
  10. Flang 2023. Flang Documentation. Retrieved Aug 16, 2023 from https://flang.llvm.org/docs/
  11. Denis Foley and John Danskin. 2017. Ultra-performance Pascal GPU and NVLink interconnect. IEEE Micro 37, 2 (2017), 7–17.
  12. Domain-specific multi-level IR rewriting for GPU: The Open Earth compiler for GPU-accelerated climate simulation. ACM Transactions on Architecture and Code Optimization (TACO) 18, 4 (2021), 1–23.
  13. Devito: Towards a generic finite difference dsl using symbolic python. In 2016 6th workshop on python for high-performance and scientific computing (PyHPC). IEEE, 67–75.
  14. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14.
  15. Exastencils: advanced multigrid solver generation. In Software for Exascale Computing-SPPEXA 2016-2019. Springer International Publishing, 405–452.
  16. Polygeist: Raising C to polyhedral MLIR. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 45–59.
  17. Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation. J. Parallel and Distrib. Comput. 131 (2019), 130–146.
  18. Steve A Piacsek and Gareth P Williams. 1970. Conservation properties of convection difference schemes. J. Comput. Phys. 6, 3 (1970), 392–405.
  19. Pylir 2023. Pylir Documentation. Retrieved Aug 16, 2023 from https://zero9178.github.io/Pylir/
  20. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117–128.
  21. Alya: towards exascale for engineering simulation codes. arXiv preprint arXiv:1404.4881 (2014).
  22. xDSL 2023. A Python Compiler Design Toolkit. Retrieved Aug 16, 2023 from https://github.com/xdslproject/xdsl
  23. Productivity, portability, performance: Data-centric Python. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nick Brown (67 papers)
  2. Maurice Jamieson (12 papers)
  3. Anton Lydike (4 papers)
  4. Emilien Bauer (4 papers)
  5. Tobias Grosser (21 papers)
Citations (1)