Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing (2106.13645v3)

Published 25 Jun 2021 in cs.DC

Abstract: This paper presents FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing. FLASH takes a novel approach in describing kernels and dynamically dispatching them in a hardware-agnostic manner. FLASH features truly hardware-agnostic frontend interfaces, which unify the compile-time control flow and enforce a portability-optimized code organization that imposes a demarcation between computational (performance-critical) and functional (non-performance-critical) codes as well as the separation of hardware-specific and hardware-agnostic codes in the host application. We use static code analysis to measure the hardware independence ratio of twelve popular HPC applications and show that up to 99.72% code portability can be achieved with FLASH. Similarly, we measure and compare the complexity of state-of-the-art portable programming models to show that FLASH can achieve a code reduction of up to 4.0x for two common HPC kernels while maintaining 100% code portability with a normalized framework overhead between 1% - 13% of the total kernel runtime. The codes are available at https://github.com/PSCLab-ASU/FLASH.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. R. O. Kirk, G. R. Mudalige, I. Z. Reguly, S. A. Wright, M. J. Martineau, and S. A. Jarvis, “Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems,” in 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 834–841.
  2. G. Brown, R. Reyes, and M. Wong, “Towards heterogeneous and distributed computing in c++,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3318170.3318196
  3. M. Ghane, S. Chandrasekaran, and M. S. Cheung, “Towards a portable hierarchical view of distributed shared memory systems: Challenges and solutions,” in Proceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores, ser. PMAM ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3380536.3380542
  4. A. Bader, J. Brodman, and M. Kinsner, “A sycl compiler and runtime architecture,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3318170.3318194
  5. B. Ashbaugh, A. Bader, J. Brodman, J. Hammond, M. Kinsner, J. Pennycook, R. Schulz, and J. Sewall, “Data parallel c++: Enhancing sycl through extensions for productivity and performance,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388653
  6. W. Powell, J. Riedy, J. S. Young, and T. M. Conte, “Wrangling rogues: A case study on managing experimental post-moore architectures,” in Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), ser. PEARC ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3332186.3332223
  7. T. Zhou, J. Shirako, A. Jain, S. Srikanth, T. M. Conte, R. Vuduc, and V. Sarkar, “Intrepydd: Performance, productivity, and portability for data science application kernels,” in Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, ser. Onward! 2020.   New York, NY, USA: Association for Computing Machinery, 2020, p. 65–83. [Online]. Available: https://doi.org/10.1145/3426428.3426915
  8. M. Wong and H. Finkel, “Distributed & heterogeneous programming in c++ for hpc at sc17,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’18.   New York, NY, USA: Association for Computing Machinery, 2018. [Online]. Available: https://doi.org/10.1145/3204919.3204939
  9. M. Babej and P. Jääskeläinen, “Hipcl: Tool for porting cuda applications to advanced opencl platforms through hip,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388641
  10. T. Deakin and S. McIntosh-Smith, “Evaluating the performance of hpc-style sycl applications,” in Proceedings of the International Workshop on OpenCL, ser. IWOCL ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3388333.3388643
  11. D. A. Beckingsale, J. Burmark, R. Hornung, H. Jones, W. Killian, A. J. Kunen, O. Pearce, P. Robinson, B. S. Ryujin, and T. R. Scogland, “Raja: Portable performance for large-scale scientific applications,” in 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), 2019, pp. 71–81.
  12. S. Memeti, L. Li, S. Pllana, J. Kołodziej, and C. Kessler, “Benchmarking opencl, openacc, openmp, and cuda: Programming productivity, performance, and energy consumption,” in Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ser. ARMS-CC ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 1–6. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3110355.3110356
  13. C. Jin, M. Baskaran, B. Meister, and J. Springer, “Automatic parallelization to asynchronous task-based runtimes through a generic runtime layer,” in 2019 IEEE High Performance Extreme Computing Conference (HPEC), 2019, pp. 1–11.
  14. J. Szuppe, “Boost.compute: A parallel computing library for c++ based on opencl,” in Proceedings of the 4th International Workshop on OpenCL, ser. IWOCL ’16.   New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/2909437.2909454
  15. F. Thaler, S. Moosbrugger, C. Osuna, M. Bianco, H. Vogt, A. Afanasyev, L. Mosimann, O. Fuhrer, T. C. Schulthess, and T. Hoefler, “Porting the cosmo weather model to manycore cpus,” in Proceedings of the Platform for Advanced Scientific Computing Conference, ser. PASC ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3324989.3325723
  16. S. Abdulah, Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, “Accelerating geostatistical modeling and prediction with mixed-precision computations: A high-productivity approach with parsec,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, pp. 964 – 976, 2022-04 2022.
  17. J. Kim, S. Lee, B. Johnston, and J. S. Vetter, “Iris: A portable runtime system exploiting multiple heterogeneous programming systems,” in 2021 IEEE High Performance Extreme Computing Conference (HPEC).   IEEE, 2021, pp. 1–8.
  18. C. Augonnet, S. Thibault, and R. Namyst, “StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines,” INRIA, Research Report RR-7240, Mar. 2010. [Online]. Available: http://hal.inria.fr/inria-00467677
  19. “How to use tag dispatching in your code effectively,” https://www.fluentcpp.com/2018/04/27/tag-dispatching/, accessed: 2021-04-27.
  20. “GROMACS Github Repo,” https://github.com/gromacs/gromacs, accessed: 2021-01-23.
  21. “NAMD Github Repo,” https://github.com/nbcrrolls/namd, accessed: 2021-01-23.
  22. “WRF Github Repo,” https://github.com/wrf-model/WRF, accessed: 2021-01-23.
  23. “GAMESS Github Repo,” https://github.com/streaver91/gamess, accessed: 2021-01-23.
  24. “LAMMPS Github Repo,” https://github.com/lammps/lammps, accessed: 2021-01-23.
  25. “CP2K Github Repo,” https://github.com/cp2k/cp2k, accessed: 2021-01-23.
  26. “PARAVIEW Github Repo,” https://github.com/Kitware/ParaView, accessed: 2021-01-23.
  27. “MILC Code Version,” web.physics.utah.edu/ detar/milc/milc_qcd.html, accessed: 2021-01-23.
  28. “CHROMA Github Repo,” https://github.com/JeffersonLab/chroma, accessed: 2021-01-23.
  29. “AutoDock Github Repo,” https://github.com/ccsb-scripps/AutoDock-GPU, accessed: 2021-01-23.
  30. “ABINIT Github Repo,” https://github.com/abinit/abinit, accessed: 2021-01-23.
  31. “SPECFEM Github Repo,” https://github.com/geodynamics/specfem3d, accessed: 2021-01-23.

Summary

We haven't generated a summary for this paper yet.