Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated MPI-X code generation for scalable finite-difference solvers (2312.13094v4)

Published 20 Dec 2023 in cs.DC, cs.MS, and cs.PF

Abstract: Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to execute explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modeling simulations for real-world applications at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive strong and weak scaling on CPU and GPU clusters, proving its effectiveness and capability to meet the demands of large-scale scientific simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly, “Firedrake: Automating the finite element method by composing abstractions,” ACM Trans. Math. Softw., vol. 43, no. 3, dec 2016. [Online]. Available: https://doi.org/10.1145/2998441
  2. S. Balay, S. Abhyankar, M. F. Adams, S. Benson, J. Brown, P. Brune, K. Buschelman, E. Constantinescu, L. Dalcin, A. Dener, V. Eijkhout, J. Faibussowitsch, W. D. Gropp, V. Hapla, T. Isaac, P. Jolivet, D. Karpeev, D. Kaushik, M. G. Knepley, F. Kong, S. Kruger, D. A. May, L. C. McInnes, R. T. Mills, L. Mitchell, T. Munson, J. E. Roman, K. Rupp, P. Sanan, J. Sarich, B. F. Smith, S. Zampini, H. Zhang, H. Zhang, and J. Zhang, “PETSc/TAO users manual,” Argonne National Laboratory, Tech. Rep. ANL-21/39 - Revision 3.20, 2023.
  3. R. J. Hewett and T. J. G. I. au2, “A linear algebraic approach to model parallelism in deep learning,” 2020. [Online]. Available: https://arxiv.org/abs/2006.03108
  4. T. J. Grady, R. Khan, M. Louboutin, Z. Yin, P. A. Witte, R. Chandra, R. J. Hewett, and F. J. Herrmann, “Model-parallel fourier neural operators as learned surrogates for large-scale parametric pdes,” Computers & Geosciences, vol. 178, p. 105402, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300423001061
  5. F. Luporini, M. Louboutin, M. Lange, N. Kukreja, P. Witte, J. Hückelheim, C. Yount, P. H. J. Kelly, F. J. Herrmann, and G. J. Gorman, “Architecture and performance of devito, a system for automated stencil computation,” ACM Trans. Math. Softw., vol. 46, no. 1, apr 2020. [Online]. Available: https://doi.org/10.1145/3374916
  6. M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman, “Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration,” Geoscientific Model Development, vol. 12, no. 3, pp. 1165–1187, 2019. [Online]. Available: https://www.geosci-model-dev.net/12/1165/2019/
  7. P. A. Witte, M. Louboutin, N. Kukreja, F. Luporini, M. Lange, G. J. Gorman, and F. J. Herrmann, “A large-scale framework for symbolic implementations of seismic inversion algorithms in julia,” GEOPHYSICS, vol. 84, no. 3, pp. F57–F71, 2019. [Online]. Available: https://doi.org/10.1190/geo2018-0174.1
  8. C. Cueto, L. Guasch, F. Luporini, O. Bates, G. Strong, O. C. Agudo, J. Cudeiro, P. Kelly, G. Gorman, and M.-X. Tang, “Tomographic ultrasound modelling and imaging with Stride and Devito,” in Medical Imaging 2022: Ultrasonic Imaging and Tomography, N. Bottenus and N. V. Ruiter, Eds., vol. PC12038, International Society for Optics and Photonics.   SPIE, 2022, p. PC1203805. [Online]. Available: https://doi.org/10.1117/12.2611072
  9. C. Cueto, O. Bates, G. Strong, J. Cudeiro, F. Luporini, Òscar Calderón Agudo, G. Gorman, L. Guasch, and M.-X. Tang, “Stride: A flexible software platform for high-performance ultrasound computed tomography,” Computer Methods and Programs in Biomedicine, vol. 221, p. 106855, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169260722002371
  10. A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, “Sympy: symbolic computing in python,” PeerJ Computer Science, vol. 3, p. e103, Jan. 2017. [Online]. Available: https://doi.org/10.7717/peerj-cs.103
  11. L. Dalcin and Y.-L. L. Fang, “Mpi4py: Status update after 12 years of development,” Computing in Science and Engg., vol. 23, no. 4, p. 47–54, jul 2021. [Online]. Available: https://doi.org/10.1109/MCSE.2021.3083216
  12. G. Bisbas, F. Luporini, M. Louboutin, R. Nelson, G. J. Gorman, and P. H. Kelly, “Temporal blocking of finite-difference stencil operators with sparse “off-the-grid” sources,” in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021, pp. 497–506. [Online]. Available: https://ieeexplore.ieee.org/document/9460483
  13. M. Louboutin, M. Lange, F. J. Herrmann, N. Kukreja, and G. Gorman, “Performance prediction of finite-difference solvers for different computer architectures,” Computers & Geosciences, vol. 105, pp. 148–157, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300416304034
  14. S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, p. 65–76, apr 2009. [Online]. Available: https://doi.org/10.1145/1498765.1498785
  15. Y. Zhang, H. Zhang, and G. Zhang, “A stable tti reverse time migration and its implementation,” Geophysics, vol. 76, no. 3, pp. WA3–WA11, 2011. [Online]. Available: https://library.seg.org/doi/10.1190/1.3554411
  16. M. Louboutin, P. Witte, and F. J. Herrmann, “Effects of wrong adjoints for rtm in tti media,” in SEG Technical Program Expanded Abstracts 2018.   Society of Exploration Geophysicists, 2018, pp. 331–335. [Online]. Available: https://library.seg.org/doi/10.1190/segam2018-2996274.1
  17. E. Duveneck and P. M. Bakker, “Stable p-wave modeling for reverse-time migration in tilted ti media,” GEOPHYSICS, vol. 76, no. 2, pp. S65–S75, 2011. [Online]. Available: https://doi.org/10.1190/1.3533964
  18. T. Alkhalifah, “An acoustic wave equation for anisotropic media,” Geophysics, vol. 65, pp. 1239–1250, 2000. [Online]. Available: https://library.seg.org/doi/10.1190/1.1444815
  19. K. Bube, J. Washbourne, R. Ergas, and T. Nemeth, “Self-adjoint, energy-conserving second-order pseudoacoustic systems for vti and tti media for reverse time migration and full-waveform inversion,” in SEG Technical Program Expanded Abstracts 2016.   Society of Exploration Geophysicists, 2016, pp. 1110–1114. [Online]. Available: https://library.seg.org/doi/10.1190/segam2016-13878451.1
  20. J. O. A. Robertson, J. O. Blanch, and W. W. Symes, “Viscoelastic finite-difference modeling,” Geophysics, vol. 59, no. 9, p. 1444–1456, 1994. [Online]. Available: https://library.seg.org/doi/abs/10.1190/1.1443701
  21. A. Gholamy and V. Kreinovich, “Why ricker wavelets are successful in processing seismic data: Towards a theoretical explanation,” in 2014 IEEE Symposium on Computational Intelligence for Engineering Solutions (CIES), 2014, pp. 11–16. [Online]. Available: https://ieeexplore.ieee.org/document/7011824
  22. T. Zhao, S. Williams, M. Hall, and H. Johansen, “Delivering performance-portable stencil computations on cpus and gpus using bricks,” in 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), 2018, pp. 59–70.
  23. H. Wang and A. Chandramowlishwaran, “Pencil: A pipelined algorithm for distributed stencils,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’20.   IEEE Press, 2020. [Online]. Available: https://dl.acm.org/doi/10.5555/3433701.3433814
  24. T. Malas, G. Hager, H. Ltaief, H. Stengel, G. Wellein, and D. Keyes, “Multicore-optimized wavefront diamond blocking for optimizing stencil updates,” SIAM J. Sci. Comput., vol. 37, no. 4, p. C439–C464, jan 2015. [Online]. Available: https://doi.org/10.1137/140991133
  25. N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka, “Physis: An implicitly parallel programming model for stencil computations on large-scale gpu-accelerated supercomputers,” in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’11.   New York, NY, USA: Association for Computing Machinery, 2011. [Online]. Available: https://doi.org/10.1145/2063384.2063398
  26. C. Yount, J. Tobin, A. Breuer, and A. Duran, “Yask-yet another stencil kernel: A framework for hpc stencil code-generation and tuning,” in Proceedings of the Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for HPC, ser. WOLFHPC ’16.   IEEE Press, 2016, p. 30–39. [Online]. Available: https://dl.acm.org/doi/10.5555/3019129.3019133
  27. A. Afanasyev, M. Bianco, L. Mosimann, C. Osuna, F. Thaler, H. Vogt, O. Fuhrer, J. VandeVondele, and T. C. Schulthess, “Gridtools: A framework for portable weather and climate applications,” SoftwareX, vol. 15, p. 100707, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352711021000522
  28. S. Omlin, L. Räss, and I. Utkin, “Distributed parallelization of xpu stencil computations in julia,” arXiv preprint arXiv:2211.15716, 2022. [Online]. Available: https://arxiv.org/abs/2211.15716
  29. J. D. Betteridge, P. E. Farrell, and D. A. Ham, “Code generation for productive, portable, and scalable finite element simulation in firedrake,” Computing in Science & Engineering, vol. 23, no. 4, pp. 8–17, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9447889
  30. G. R. Mudalige, I. Z. Reguly, A. Prabhakar, D. Amirante, L. Lapworth, and S. A. Jarvis, “Towards virtual certification of gas turbine engines with performance-portable simulations,” in 2022 IEEE International Conference on Cluster Computing (CLUSTER), 2022, pp. 206–217. [Online]. Available: https://ieeexplore.ieee.org/document/9912706
  31. P. Vincent, F. Witherden, B. Vermeire, J. S. Park, and A. Iyer, “Towards green aviation with python at petascale,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’16.   IEEE Press, 2016. [Online]. Available: https://dl.acm.org/doi/10.5555/3014904.3014906
  32. S. Adams, R. Ford, M. Hambley, J. Hobson, I. Kavčič, C. Maynard, T. Melvin, E. Müller, S. Mullerworth, A. Porter, M. Rezny, B. Shipway, and R. Wong, “Lfric: Meeting the challenges of scalability and performance portability in weather and climate models,” Journal of Parallel and Distributed Computing, vol. 132, pp. 383–396, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731518305306
  33. T. Denniston, S. Kamil, and S. Amarasinghe, “Distributed halide,” in Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’16.   New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi.org/10.1145/2851141.2851157
  34. J. Pekkilä, M. S. Väisälä, M. J. Käpylä, M. Rheinhardt, and O. Lappi, “Scalable communication for high-order stencil computations using cuda-aware mpi,” Parallel Computing, vol. 111, p. 102904, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819122000102
  35. I. Zacharoudiou, J. McCullough, and P. Coveney, “Development and performance of a hemelb gpu code for human-scale blood flow simulation,” Computer Physics Communications, vol. 282, p. 108548, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010465522002673
  36. C. T. Jacobs, S. P. Jammy, and N. D. Sandham, “Opensbli: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures,” Journal of Computational Science, vol. 18, pp. 12–23, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S187775031630299X
  37. D. J. Lusher, S. P. Jammy, and N. D. Sandham, “Opensbli: Automated code-generation for heterogeneous computing architectures applied to compressible fluid dynamics on structured grids,” Computer Physics Communications, vol. 267, p. 108063, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010465521001752
  38. I. Z. Reguly, G. R. Mudalige, and M. B. Giles, “Loop tiling in large-scale stencil codes at run-time with ops,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 4, pp. 873–886, 2018.
  39. G. Mudalige, I. Reguly, S. Jammy, C. Jacobs, M. Giles, and N. Sandham, “Large-scale performance of a dsl-based multi-block structured-mesh application for direct numerical simulation,” Journal of Parallel and Distributed Computing, vol. 131, pp. 130–146, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731518305690
  40. G. Mudalige, M. Giles, I. Reguly, C. Bertolli, and P. Kelly, “Op2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures,” in 2012 Innovative Parallel Computing (InPar), 2012, pp. 1–12. [Online]. Available: https://ieeexplore.ieee.org/document/6339594
  41. S. Macià, P. J. Martínez-Ferrer, E. Ayguadé, and V. Beltran, “Assessing saiph, a task-based dsl for high-performance computational fluid dynamics,” Future Generation Computer Systems, vol. 147, pp. 235–250, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23001759
  42. M. Jacquelin, M. Araya-Polo, and J. Meng, “Scalable distributed high-order stencil computations,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’22.   IEEE Press, 2022. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3571885.3571924
  43. H. Ltaief, Y. Hong, L. Wilson, M. Jacquelin, M. Ravasi, and D. E. Keyes, “Scaling the “memory wall” for multi-dimensional seismic processing with algebraic compression on cerebras cs-2 systems,” 2023. [Online]. Available: http://hdl.handle.net/10754/694388
  44. J. Virieux, “P-sv wave propagation in heterogeneous media: Velocity-stress finite-difference method,” Geophysics, vol. 51, no. 4, pp. 889–901, 1986. [Online]. Available: https://doi.org/10.1190/1.1442147
  45. F. Luporini, M. Louboutin, M. Lange, N. Kukreja, rhodrin, G. Bisbas, V. Pandolfo, L. Cavalcante, T. Burgess, G. Gorman, and K. Hester, “devitocodes/devito: v4.7.1,” Aug. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6958070
Citations (2)

Summary

We haven't generated a summary for this paper yet.