Seamless acceleration of Fortran intrinsics via AMD AI engines (2502.10254v1)

Published 14 Feb 2025 in cs.DC, cs.ET, and cs.PF

Abstract: A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. Specialised architectures, such as FPGAs and AMD's AI Engines (AIEs), have been demonstrated to provide significant energy efficiency advantages, however a major challenge is that to most effectively program these architectures requires significant expertise and investment of time which is a major blocker. Fortran in the lingua franca of scientific computing, and in this paper we explore automatically accelerating Fortran intrinsics via the AIEs in AMD's Ryzen AI CPU. Leveraging the open source Flang compiler and MLIR ecosystem, we describe an approach that lowers the MLIR linear algebra dialect to AMD's AIE dialects, and demonstrate that for suitable workloads the AIEs can provide significant performance advantages over the CPU without any code modifications required by the programmer.

Authors (2)

Nick Brown (67 papers)
Gabriel Rodríguez Canal (1 paper)

Summary

The paper introduces an automated compiler enhancement that maps Fortran intrinsics to AMD AI Engines via MLIR, enabling transparent acceleration without modifying source code.
It leverages the open-source Flang compiler and a custom xrt_wrapper dialect to transform and offload linear algebra operations efficiently.
Performance evaluations confirm that repeated intrinsic calls, especially matrix multiplication, achieve competitive or superior speedups compared to CPU execution.

Seamless Acceleration of Fortran Intrinsics via AMD AI Engines

This paper addresses a pressing challenge in the high-performance computing (HPC) landscape: enhancing computational performance in Fortran applications by transparently harnessing the power of AMD's AI Engines (AIEs). Fortran remains a dominant language in scientific computing due to its performance capabilities and the maturity of its ecosystem. Nonetheless, leveraging cutting-edge hardware, such as the AMD AIEs found in their Ryzen AI CPUs, requires a sophisticated understanding of these architectures, presenting a significant barrier to entry for many practitioners.

The researchers propose an innovative methodology to automatically offload Fortran intrinsic procedures onto the AMD AIEs without necessitating any modifications to the original Fortran code. Utilizing the capabilities of the open-source Flang compiler and the MLIR (Multi-Level Intermediate Representation) ecosystem, the authors have extended the Flang compilation process to map Fortran intrinsics to a linear algebra (linalg) dialect within MLIR. This representation enables the efficient harnessing of the computational capabilities of the AIEs via an MLIR-to-AIE transformation flow.

A primary advancement in this research is the development of the xrt_wrapper MLIR dialect, enabling seamless interaction between the CPU and AIEs via the Xilinx Runtime (XRT). The approach also involves the construction of a library of pre-defined MLIR templates for a variety of linear algebra operations, which are dynamically adapted based on the specific invocation of Fortran intrinsics in user applications.

Performance evaluations of the proposed system demonstrate that for workloads with repeated calls to specific Fortran intrinsics, such as reductions and transpositions, the overhead associated with initial execution on the AIE diminishes on subsequent runs, resulting in performance competitive with, or superior to, CPU execution. Particularly, the handling of matrix multiplication intrinsic (matmul) was shown to benefit from AIE-specific optimizations and delivered significant performance gains over CPU execution.

From a practical standpoint, this research presents a compelling case for integrating AI engines into routine scientific workflows via automated compiler techniques, thus democratizing access to hardware acceleration for the broader Fortran community. Theoretically, it also positions MLIR as a central tool not only for language-agnostic optimizations but also for specialized hardware acceleration tasks. Moving forward, extending such techniques to include a more comprehensive set of Fortran patterns beyond intrinsics, along with integration into ML frameworks that utilize the linalg dialect, presents exciting opportunities for future research and toolchain development.

Ultimately, this research contributes significantly to the broader objective of improving accessibility and utility of emerging hardware platforms like AMD AIEs in established computational domains, showcasing how transparent acceleration frameworks can be developed within the evolving landscape of compiler technologies.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/Underfox3/status/1891336786560811062

https://twitter.com/HPCPapers/status/1891368099460366354

HackerNews

Seamless acceleration of Fortran intrinsics via AMD AI engines (2 points, 0 comments)