FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing (2106.13645v3)
Abstract: This paper presents FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing. FLASH takes a novel approach in describing kernels and dynamically dispatching them in a hardware-agnostic manner. FLASH features truly hardware-agnostic frontend interfaces, which unify the compile-time control flow and enforce a portability-optimized code organization that imposes a demarcation between computational (performance-critical) and functional (non-performance-critical) codes as well as the separation of hardware-specific and hardware-agnostic codes in the host application. We use static code analysis to measure the hardware independence ratio of twelve popular HPC applications and show that up to 99.72% code portability can be achieved with FLASH. Similarly, we measure and compare the complexity of state-of-the-art portable programming models to show that FLASH can achieve a code reduction of up to 4.0x for two common HPC kernels while maintaining 100% code portability with a normalized framework overhead between 1% - 13% of the total kernel runtime. The codes are available at https://github.com/PSCLab-ASU/FLASH.
- R. O. Kirk, G. R. Mudalige, I. Z. Reguly, S. A. Wright, M. J. Martineau, and S. A. Jarvis, “Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems,” in 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 834–841.
- G. Brown, R. Reyes, and M. Wong, “Towards heterogeneous and distributed computing in c++,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3318170.3318196
- M. Ghane, S. Chandrasekaran, and M. S. Cheung, “Towards a portable hierarchical view of distributed shared memory systems: Challenges and solutions,” in Proceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores, ser. PMAM ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3380536.3380542
- A. Bader, J. Brodman, and M. Kinsner, “A sycl compiler and runtime architecture,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3318170.3318194
- B. Ashbaugh, A. Bader, J. Brodman, J. Hammond, M. Kinsner, J. Pennycook, R. Schulz, and J. Sewall, “Data parallel c++: Enhancing sycl through extensions for productivity and performance,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388653
- W. Powell, J. Riedy, J. S. Young, and T. M. Conte, “Wrangling rogues: A case study on managing experimental post-moore architectures,” in Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), ser. PEARC ’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3332186.3332223
- T. Zhou, J. Shirako, A. Jain, S. Srikanth, T. M. Conte, R. Vuduc, and V. Sarkar, “Intrepydd: Performance, productivity, and portability for data science application kernels,” in Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, ser. Onward! 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 65–83. [Online]. Available: https://doi.org/10.1145/3426428.3426915
- M. Wong and H. Finkel, “Distributed & heterogeneous programming in c++ for hpc at sc17,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’18. New York, NY, USA: Association for Computing Machinery, 2018. [Online]. Available: https://doi.org/10.1145/3204919.3204939
- M. Babej and P. Jääskeläinen, “Hipcl: Tool for porting cuda applications to advanced opencl platforms through hip,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388641
- T. Deakin and S. McIntosh-Smith, “Evaluating the performance of hpc-style sycl applications,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388643
- D. A. Beckingsale, J. Burmark, R. Hornung, H. Jones, W. Killian, A. J. Kunen, O. Pearce, P. Robinson, B. S. Ryujin, and T. R. Scogland, “Raja: Portable performance for large-scale scientific applications,” in 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), 2019, pp. 71–81.
- S. Memeti, L. Li, S. Pllana, J. Kołodziej, and C. Kessler, “Benchmarking opencl, openacc, openmp, and cuda: Programming productivity, performance, and energy consumption,” in Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ser. ARMS-CC ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 1–6. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3110355.3110356
- C. Jin, M. Baskaran, B. Meister, and J. Springer, “Automatic parallelization to asynchronous task-based runtimes through a generic runtime layer,” in 2019 IEEE High Performance Extreme Computing Conference (HPEC), 2019, pp. 1–11.
- J. Szuppe, “Boost.compute: A parallel computing library for c++ based on opencl,” in Proceedings of the 4th International Workshop on OpenCL, ser. IWOCL ’16. New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/2909437.2909454
- F. Thaler, S. Moosbrugger, C. Osuna, M. Bianco, H. Vogt, A. Afanasyev, L. Mosimann, O. Fuhrer, T. C. Schulthess, and T. Hoefler, “Porting the cosmo weather model to manycore cpus,” in Proceedings of the Platform for Advanced Scientific Computing Conference, ser. PASC ’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3324989.3325723
- S. Abdulah, Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, “Accelerating geostatistical modeling and prediction with mixed-precision computations: A high-productivity approach with parsec,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, pp. 964 – 976, 2022-04 2022.
- J. Kim, S. Lee, B. Johnston, and J. S. Vetter, “Iris: A portable runtime system exploiting multiple heterogeneous programming systems,” in 2021 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2021, pp. 1–8.
- C. Augonnet, S. Thibault, and R. Namyst, “StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines,” INRIA, Research Report RR-7240, Mar. 2010. [Online]. Available: http://hal.inria.fr/inria-00467677
- “How to use tag dispatching in your code effectively,” https://www.fluentcpp.com/2018/04/27/tag-dispatching/, accessed: 2021-04-27.
- “GROMACS Github Repo,” https://github.com/gromacs/gromacs, accessed: 2021-01-23.
- “NAMD Github Repo,” https://github.com/nbcrrolls/namd, accessed: 2021-01-23.
- “WRF Github Repo,” https://github.com/wrf-model/WRF, accessed: 2021-01-23.
- “GAMESS Github Repo,” https://github.com/streaver91/gamess, accessed: 2021-01-23.
- “LAMMPS Github Repo,” https://github.com/lammps/lammps, accessed: 2021-01-23.
- “CP2K Github Repo,” https://github.com/cp2k/cp2k, accessed: 2021-01-23.
- “PARAVIEW Github Repo,” https://github.com/Kitware/ParaView, accessed: 2021-01-23.
- “MILC Code Version,” web.physics.utah.edu/ detar/milc/milc_qcd.html, accessed: 2021-01-23.
- “CHROMA Github Repo,” https://github.com/JeffersonLab/chroma, accessed: 2021-01-23.
- “AutoDock Github Repo,” https://github.com/ccsb-scripps/AutoDock-GPU, accessed: 2021-01-23.
- “ABINIT Github Repo,” https://github.com/abinit/abinit, accessed: 2021-01-23.
- “SPECFEM Github Repo,” https://github.com/geodynamics/specfem3d, accessed: 2021-01-23.