Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Julia as a unifying end-to-end workflow language on the Frontier exascale system (2309.10292v3)

Published 19 Sep 2023 in cs.DC and cs.PL

Abstract: We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition language, as measured on the fastest supercomputer in the world.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. AMD. 2022. AMD ROCm v5.2 Release. https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html#amd-rocm-v5-2-release
  2. Utkarsh Ayachit. 2015. The ParaView guide: a parallel visualization application. Kitware, Inc.
  3. RAJA: Portable Performance for Large-Scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 71–81. https://doi.org/10.1109/P3HPC49587.2019.00012
  4. Workflows are the New Applications: Challenges in Performance, Portability, and Productivity. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 57–69. https://doi.org/10.1109/P3HPC51967.2020.00011
  5. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. 59, 1 (Jan. 2017), 65–98. https://doi.org/10.1137/141000671
  6. MPI.jl: Julia bindings for the Message Passing Interface. Proceedings of the JuliaCon Conferences 1, 1 (2021), 68. https://doi.org/10.21105/jcon.00068
  7. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202–3216. https://doi.org/10.1016/j.jpdc.2014.07.003 Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
  8. Bridging HPC Communities through the Julia Programming Language. arXiv:2211.02740 [cs.DC]
  9. Simon Danisch and Julius Krumbiegel. 2021. Makie.jl: Flexible high-performance data visualization for Julia. Journal of Open Source Software 6, 65 (2021), 3349. https://doi.org/10.21105/joss.03349
  10. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. 1–12. https://doi.org/10.1109/SC.2008.5222004
  11. The future of scientific workflows. The International Journal of High Performance Computing Applications 32, 1 (2018), 159–175. https://doi.org/10.1177/1094342017704893 arXiv:https://doi.org/10.1177/1094342017704893
  12. The LINPACK benchmark: past, present and future. Concurrency and Computation: practice and experience 15, 9 (2003), 803–820.
  13. Flexible Performant GEMM Kernels on GPUs. IEEE Transactions on Parallel and Distributed Systems 33, 9 (2022), 2230–2248. https://doi.org/10.1109/TPDS.2021.3136457
  14. Workflows Community Summit 2022: A Roadmap Revolution. Technical Report ORNL/TM-2023/2885. Oak Ridge National Laboratory. https://doi.org/10.5281/zenodo.7750670
  15. A Community Roadmap for Scientific Workflows Research and Development. In 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS). 81–90. https://doi.org/10.1109/WORKS54523.2021.00016
  16. A Characterization of Workflow Management Systems for Extreme-Scale Applications. Future Generation Computer Systems 75 (2017), 228–238. https://doi.org/10.1016/j.future.2017.02.026
  17. Productivity meets Performance: Julia on A64FX. In 2022 IEEE International Conference on Cluster Computing (CLUSTER). 549–555. https://doi.org/10.1109/CLUSTER51413.2022.00072
  18. FAIR computational workflows. Data Intelligence 2, 1-2 (2020), 108–121.
  19. William F. Godoy and Xu Liu. 2011. Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer. Numerical Heat Transfer, Part B: Fundamentals 59, 1 (2011), 1–25. https://doi.org/10.1080/10407790.2010.541359
  20. Adios 2: The Adaptable Input Output System. A framework for high-performance data management. SoftwareX 12 (2020), 100561. https://doi.org/10.1016/j.softx.2020.100561
  21. Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes. In 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 373–382. https://doi.org/10.1109/IPDPSW59300.2023.00068
  22. Using MPI-2: Advanced features of the message passing interface. MIT press.
  23. High-Performance Code Generation for Stencil Computations on GPU Architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (San Servolo Island, Venice, Italy) (ICS ’12). Association for Computing Machinery, New York, NY, USA, 311–320. https://doi.org/10.1145/2304576.2304619
  24. Sascha Hunold and Sebastian Steiner. 2020. Benchmarking Julia’s Communication Performance: Is Julia HPC ready or Full HPC?. In 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). IEEE. https://doi.org/10.1109/pmbs51919.2020.00008
  25. Marcin Krotkiewski and Marcin Dabrowski. 2013. Efficient 3D stencil computations using CUDA. Parallel Comput. 39, 10 (2013), 533–548. https://doi.org/10.1016/j.parco.2013.08.002
  26. Numba: A LLVM-based Python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. 1–6.
  27. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 75–86.
  28. Wei-Chen Lin and Simon McIntosh-Smith. 2021. Comparing Julia to Performance Portable Parallel Programming Models for HPC. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 94–105. https://doi.org/10.1109/PMBS54543.2021.00016
  29. Bring the BitCODE-Moving Compute and Data in Distributed Heterogeneous Systems. In 2022 IEEE International Conference on Cluster Computing (CLUSTER). 12–22. https://doi.org/10.1109/CLUSTER51413.2022.00017
  30. John D McCalpin et al. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE computer society technical committee on computer architecture (TCCA) newsletter 2, 19-25 (1995).
  31. NVIDIA. 2022. CUDA Toolkit Documentation - v11.7.0. https://developer.nvidia.com/cuda-toolkit
  32. OpenMP Architecture Review Board. 2021. OpenMP Application Program Interface Version 5.2. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
  33. John E. Pearson. 1993. Complex Patterns in a Simple System. Science 261, 5118 (1993), 189–192. https://doi.org/10.1126/science.261.5118.189 arXiv:https://www.science.org/doi/pdf/10.1126/science.261.5118.189
  34. Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2. In Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, Jeffrey Nichols, Arthur ‘Barney’ Maccabe, James Nutaro, Swaroop Pophale, Pravallika Devineni, Theresa Ahearn, and Becky Verastegui (Eds.). Springer International Publishing, Cham, 99–118.
  35. Fides: a general purpose data model library for streaming data. In High Performance Computing: ISC High Performance Digital 2021 International Workshops, Frankfurt am Main, Germany, June 24–July 2, 2021, Revised Selected Papers 36. Springer, 495–507.
  36. Adaptive numerical simulations with Trixi.jl: A case study of Julia for scientific computing. Proceedings of the JuliaCon Conferences 1, 1 (2022), 77. https://doi.org/10.21105/jcon.00077
  37. Cataloging the visible universe through Bayesian inference in Julia at petascale. J. Parallel and Distrib. Comput. 127 (2019), 89–104. https://doi.org/10.1016/j.jpdc.2018.12.008
  38. JuliaGPU/AMDGPU.jl: v0.4.1. https://doi.org/10.5281/zenodo.6949520
  39. Visualizing with VTK: a tutorial. IEEE Computer graphics and applications 20, 5 (2000), 20–27.
  40. Large-Scale Simulation of Quantum Computational Chemistry on a New Sunway Supercomputer. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14. https://doi.org/10.1109/SC41404.2022.00019
  41. Automatic, Efficient and Scalable Provenance Registration for FAIR HPC Workflows. In 2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS). 1–9. https://doi.org/10.1109/WORKS56498.2022.00006
  42. Python Workflows on HPC Systems. In 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC). 32–40. https://doi.org/10.1109/PyHPC51966.2020.00009
  43. Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity. (12 2018). https://doi.org/10.2172/1473756
  44. GPU-Aware Non-contiguous Data Movement In Open MPI. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 231–242.
Citations (5)

Summary

We haven't generated a summary for this paper yet.