Fortran-to-Kokkos Transformation

Updated 18 September 2025

Fortran-to-Kokkos transformation is a set of methodologies that convert legacy Fortran codes to modern, performance-portable Kokkos C++ using manual refactoring, automated translation, and agentic AI workflows.
It addresses challenges such as data structure mapping, parallel loop transformation, and host-device synchronization to overcome limitations of legacy Fortran in heterogeneous HPC environments.
The transformation methods enable significant performance gains and scalability on GPUs and other accelerators, ensuring experimental codes remain competitive on emerging architectures.

The Fortran-to-Kokkos transformation encompasses a set of methodologies, tools, and workflows for converting legacy Fortran scientific codes—typically written for homogeneous, CPU-centric architectures—into modern, performance-portable programs expressed using the Kokkos C++ programming model. The primary motivation for this transformation is the rapidly increasing diversity of high-performance computing (HPC) architectures, especially the prevalence of GPU and other accelerator-based systems where native Fortran support is lacking. The transformation strategies span manual code porting, source-to-source translation frameworks, and autonomous, agentic AI-driven workflows, all of which aim to preserve computational semantics while achieving high efficiency and portability across architectures. This article presents a comprehensive survey of the state-of-the-art in Fortran-to-Kokkos transformation, its technical foundations, operational challenges, comparative approaches, and future outlook.

1. Motivations and Scope

The dominant rationale for Fortran-to-Kokkos transformation is the urgent need to enable legacy scientific codes—often refined over decades—to execute efficiently on emerging, heterogeneous HPC systems. Many accelerators (notably GPU platforms) do not offer native or robust Fortran bindings, resulting in a critical barrier for legacy codes originally designed for homogeneous memory and execution models (Gupta et al., 15 Sep 2025). Kokkos is a C++ template library that abstracts data structures and parallel execution policies, providing performance portability and a single-source approach for targeting CPUs, NVIDIA/AMD GPUs, and ARM-based accelerators. Fortran-to-Kokkos transformation is thus viewed as a mechanism both to future-proof foundational scientific simulation codes and to lower the barrier for leveraging hardware advances without whole-system rewrites or loss of established numerical robustness.

The scope of transformation extends from kernel-level routines (e.g., linear algebra or stencil updates) to full-framework applications (e.g., block-structured adaptive mesh refinement (Stone et al., 24 Sep 2024)), and can include additional features such as automatic differentiation for optimization and sensitivity analysis (Liegeois et al., 17 Jul 2025).

2. Transformation Methodologies

Multiple methodologies have been developed for Fortran-to-Kokkos transformation, reflecting a range of tradeoffs in terms of code rewrite burden, automation, and the degree of achievable optimization:

Manual Refactoring: This approach entails an explicit rewriting of Fortran code into C++, replacing arrays with Kokkos::View types and parallel loops with Kokkos::parallel_for, parallel_reduce, or higher-level Kokkos policies. Often performed as kernel-by-kernel porting, this was adopted in performance portability evaluations for particle-in-cell simulations (Artigues et al., 2019).
Source-to-Source Translation Layers: Automated translation frameworks parse Fortran code at compile time, identify compute-intensive regions using tagging directives (e.g., !#LOOPY_START/!#LOOPY_END), and transform these into an intermediate representation compatible with GPU execution (notably using frameworks such as Loopy (Klöckner, 2015); see also (Nytko et al., 7 Feb 2025)). The translation minimises codebase intrusion, with only a small fraction of lines modified.
Agentic AI Workflows: Fully autonomous pipelines coordinate a set of specialized LLM agents to handle successive stages: translation, syntax validation, build, execution, runtime error diagnosis, functionality testing, and iterative optimization. This multi-agent architecture has been shown to yield functionally correct and highly performant Kokkos codes from Fortran sources (Gupta et al., 15 Sep 2025).

These methodologies may be summarized as follows:

Method	Degree of Automation	Legacy Code Intrusion	Portability Features
Manual Refactoring	Low	High	User-controlled; C++
Source-to-Source Translation	Moderate	Low	Via IR/Loopy/Kokkos
Agentic AI Workflows	High	Minimal	End-to-end, multi-arch

3. Architectural and Implementation Principles

A consistent pattern in transformation is the mapping of Fortran control and data constructs onto a Kokkos-compatible C++ model:

Data Structures: Fortran arrays (column-major, multidimensional) are mapped to Kokkos::View objects, with explicit specification of layout and memory space to abstract over host/device locations (Artigues et al., 2019, Gupta et al., 15 Sep 2025).
Parallel Loops: Fortran DO loops are replaced with Kokkos::parallel_for (flat or multidimensional) and, where hierarchical parallelism is beneficial, with TeamPolicy constructs. Reductions are managed via Kokkos::parallel_reduce and (in vector/multidimensional cases) by Kokkos::ScatterView (Artigues et al., 2019).
Host-Device Synchronization: Memory management is abstracted via host/device “mirror” views (Kokkos::create_mirror_view) and automatic host–device synchronization (e.g., Kokkos::deep_copy), avoiding explicit memory transfers (Artigues et al., 2019, Childers et al., 2021).
Kernel Transformation: Translation systems interpret loop domains (often via ISL-style polyhedral representations (Klöckner, 2015)) and perform transformations such as loop tiling/blocking, extraction of substitution rules for subexpressions, and memory prefetch/precomputation to optimize for device memory hierarchies (Klöckner, 2015, Nytko et al., 7 Feb 2025).
Autonomous Agentic Pipelines: LLM-based agentic workflows parse Fortran, generate semantically equivalent Kokkos C++ kernels, inject validation and test code, and use runtime profiling for automatic optimization (e.g., loop reordering, layout adjustments) (Gupta et al., 15 Sep 2025).

4. Performance and Portability Outcomes

Empirical studies demonstrate that Fortran-to-Kokkos transformation not only enables functional portability but can also result in substantial performance improvements:

Performance Gains: Automated source-to-source translation using a thin translation layer and Loopy achieved speedups of 2–3× over CPU-bound Fortran and up to 6× in multi-node GPU deployments (Nytko et al., 7 Feb 2025). Agentic AI workflows produced Kokkos codes that surpassed baseline OpenMP-tuned Fortran, with compute-bound kernels (such as DGEMM) reaching 25–52% of peak A100 GPU performance (Gupta et al., 15 Sep 2025).
Scalability: Applications built upon transformed frameworks demonstrate excellent weak scaling; AthenaK achieved over one billion cell updates per second on a single NVIDIA Grace Hopper GPU and 80% efficiency at full-system parallelism on the OLCF Frontier supercomputer (65,536 AMD GPUs) (Stone et al., 24 Sep 2024).
Overhead and Efficiency: Performance-portable gradient computations using source transformation for Kokkos codes resulted in wall-clock time for gradient evaluation being at most 2.17× that of the original function, even at large (10,000+) gradient sizes, across H100, MI250x, and Ponte Vecchio GPUs (Liegeois et al., 17 Jul 2025).

5. Operational Challenges

Translating Fortran to Kokkos entails both technical and organizational obstacles:

Code Structure: Legacy Fortran often employs constructs (e.g., inter-iteration dependencies, pointers, complex control flow) that complicate direct mapping. Automated approaches require either code annotations, pre-processing to make loop bodies “embarrassingly parallel”, or the construction of targeted parsers (Klöckner, 2015, Nytko et al., 7 Feb 2025).
Build Complexity: Kokkos mandates compile-time configuration of execution spaces. In complex codes requiring dynamic selection (e.g., at runtime or via plugin architectures), multiple compilation passes or namespace aliasing may be needed, increasing maintenance overhead (Childers et al., 2021).
Debugging and Validation: Debugging templated, compiler-generated Kokkos code is non-trivial. Agentic AI workflows address this by integrating validator, fixer, and error summarizer agents that automate syntax, build, and runtime error correction (Gupta et al., 15 Sep 2025).
Special Cases: Conditional statements, function calls, and MPI interoperability introduce additional complexities. Translation systems employ selective parsing or C++ wrappers to address calling convention mismatches (notably for MPI) (Nytko et al., 7 Feb 2025).
Parallelism Granularity: Performance is sensitive to the choice of parallelization strategy—intra-event versus inter-event parallelism in HEP applications being an example where Kokkos-based intra-event strategies may underperform compared to multithreaded CPU implementations (Childers et al., 2021).

Several transformation and portability frameworks are integral to the Fortran-to-Kokkos landscape:

Loo.py: Provides a transformation-based system where legacy Fortran is parsed to an intermediate data-parallel representation, enabling explicit transformations (loop splitting, substitution rule extraction, precomputation) in Python. Distinct from Kokkos’ embedded C++ approach, Loo.py offers granular, scriptable control for legacy modernization (Klöckner, 2015).
Loopy-based Automated Translation: As in (Nytko et al., 7 Feb 2025), Fortran code is tagged and parsed into Loopy IR, allowing for automated loop transformation and code generation targeting GPUs without intrusive code rewrites.
Agentic AI Workflows: The approach of (Gupta et al., 15 Sep 2025) orchestrates specialized LLM-based agents to handle the entire lifecycle from translation to optimization, representing an emergent paradigm for autonomous, domain-specific code modernization.
Clad-based Source Transformation: Enables reverse-mode automatic differentiation (AD) for Kokkos-based codes, with custom rules to support Kokkos abstractions. This enhances the transformed codes’ suitability for optimization and sensitivity analysis, pertinent when migrating Fortran codes requiring derivative computations (Liegeois et al., 17 Jul 2025).

7. Implications and Future Directions

The successful modernization of legacy Fortran codes via Kokkos transformation has multifaceted implications:

Rapid Legacy Modernization: The demonstrated methodologies are capable of porting and optimizing legacy codes with moderate to minimal manual intervention, promoting sustainable use of valuable scientific software (Nytko et al., 7 Feb 2025, Gupta et al., 15 Sep 2025).
Performance Portability: Kokkos-based transformations provide robust performance across CPUs, NVIDIA/AMD GPUs, and ARM platforms, as exemplified by AthenaK’s scaling and efficiency (Stone et al., 24 Sep 2024).
Extensibility: The agentic AI and translation-layer paradigms can potentially be extended to future architectures and programming models with minimal disruption, facilitating continuous adaptation as HPC ecosystems evolve.
Integration of Advanced Features: The addition of differentiable programming via source transformation tools (e.g., Clad) positions modernized codes for roles in optimization, uncertainty quantification, and inverse problems—capabilities increasingly demanded in contemporary applications (Liegeois et al., 17 Jul 2025).
Open Challenges: Bottlenecks remain for memory-bound kernels (<10% of peak performance in roofline analyses), dynamic build complexity, and open-source LLM capabilities for reliable autonomous translation (Gupta et al., 15 Sep 2025). Ongoing research is addressing performance-aware agent routing, validation frameworks, and enhanced support for challenging legacy patterns.

In summary, Fortran-to-Kokkos transformation constitutes a technically mature and multifaceted field that integrates manual porting strategies, automated translation layers, and advanced AI-driven workflows to enable legacy scientific applications to achieve functional portability and high efficiency on modern, heterogeneous HPC architectures. The body of research surveyed here highlights solutions, performance milestones, and hurdles—underscoring the importance of transformation methodologies for the future of scientific computing in mixed-architecture environments.