KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI (2404.05610v4)
Abstract: The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone. We propose novel C++ language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors. By exploiting C++'s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time. We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.
- MPI Forum, “MPI: A message-passing interface standard – version 4.1,” 2023.
- I. Laguna et al., “A large-scale study of MPI usage in open-source HPC applications,” in Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, 11 2019. [Online]. Available: http://dx.doi.org/10.1145/3295500.3356176
- M. Ruefenacht et al., “MPI language bindings are holding MPI back,” CoRR, 2021. [Online]. Available: http://arxiv.org/abs/2107.10566v1
- MPI Forum, “MPI: A message-passing interface standard – version 2.2,” 2009.
- A. C. Demiralp et al., “A C++20 Interface for MPI 4.0,” 2022, poster at SC’22.
- H. Bauke. (2015) MPL - a message passing library. [Online]. Available: https://github.com/rabauke/mpl
- S. Ghosh et al., “Towards modern c++ language support for mpi,” in 2021 Workshop on Exascale MPI (ExaMPI), 11 2021. [Online]. Available: http://dx.doi.org/10.1109/ExaMPI54564.2021.00009
- U. Manber and E. W. Myers, “Suffix arrays: A new method for on-line string searches,” SIAM J. Comput., vol. 22, no. 5, pp. 935–948, 1993.
- A. M. Kozlov, D. Darriba, T. Flouri, B. Morel, and A. Stamatakis, “RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference,” Bioinformatics, vol. 35, no. 21, pp. 4453–4455, 05 2019. [Online]. Available: https://doi.org/10.1093/bioinformatics/btz305
- A. Correa. (2018) boost-mpi3. [Online]. Available: https://gitlab.com/correaa/boost-mpi3
- A. Polukhin, “Boost.pfr,” 2016. [Online]. Available: https://www.boost.org/doc/libs/1_84_0/doc/html/boost_pfr.html
- L. D. Dalcin et al., “Parallel distributed computing using python,” Advances in Water Resources, vol. 34, no. 9, pp. 1124–1139, 2011. [Online]. Available: http://dx.doi.org/10.1016/j.advwatres.2011.04.013
- L. Dalcín and Y. L. Fang, “mpi4py: Status update after 12 years of development,” Comput. Sci. Eng., vol. 23, no. 4, pp. 47–54, 2021. [Online]. Available: https://doi.org/10.1109/MCSE.2021.3083216
- B. Steinbusch, A. Gaspar, and J. Brown, “rsmpi - MPI bindings for rust,” 2015. [Online]. Available: https://github.com/rsmpi/rsmpi
- H. S. Bjarne Stroustrup et al. (2024) C++ core guidelines. [Online]. Available: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.html
- W. S. Grant and R. Voorhies, “cereal – a C++11 library for serialization,” 2017. [Online]. Available: http://uscilab.github.io/cereal/
- J. Kärkkäinen, P. Sanders, and S. Burkhardt, “Linear work suffix array construction,” J. ACM, vol. 53, no. 6, pp. 918–936, 2006.
- T. Bingmann, “pdcx, https://github.com/bingmann/pDCX,” 2018.
- J. Fischer and F. Kurpicz, “Lightweight distributed suffix array construction,” in 21st Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 2019, pp. 27–38.
- T. Bingmann, S. Gog, and F. Kurpicz, “Scalable construction of text indexes with thrill,” in IEEE BigData. IEEE, 2018, pp. 634–643.
- G. M. Slota, S. Rajamanickam, and K. Madduri, “A case study of complex graph analysis in distributed memory: Implementation and optimization,” in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2016, pp. 293–302.
- M. Davis, G. Efstathiou, C. S. Frenk, and S. D. White, “The evolution of large-scale structure in a universe dominated by cold dark matter,” Astrophysical Journal, Part 1 (ISSN 0004-637X), vol. 292, vol. 292, pp. 371–394, 1985.
- J. Hötzer, M. Jainta, A. Vondrous, J. Ettrich, A. August, D. Stubenvoll, M. Reichardt, M. Selzer, and B. Nestler, “Phase-field simulations of large-scale microstructures by integrated parallel algorithms,” in High Performance Computing in Science and Engineering ‘14. Springer, 2015, pp. 629–644.
- P. Sanders and D. Seemaier, “Distributed deep multilevel graph partitioning,” in Euro-Par 2023: Parallel Processing. Springer Nature Switzerland, 2023, pp. 443–457.
- A. Stamatakis, “RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies,” Bioinformatics, vol. 30, no. 9, pp. 1312–1313, 01 2014. [Online]. Available: https://doi.org/10.1093/bioinformatics/btu033
- L. V. Kalé, S. Kumar, and K. Varadarajan, “A framework for collective personalized communication,” in 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 22-26 April 2003, Nice, France, CD-ROM/Abstracts Proceedings. IEEE Computer Society, 2003, p. 69. [Online]. Available: https://doi.org/10.1109/IPDPS.2003.1213166
- T. Hoefler, C. Siebert, and A. Lumsdaine, “Scalable communication protocols for dynamic sparse data exchange,” in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, R. Govindarajan, D. A. Padua, and M. W. Hall, Eds. ACM, 2010, pp. 159–168. [Online]. Available: https://doi.org/10.1145/1693453.1693476
- D. Funke, S. Lamm, U. Meyer, M. Penschuck, P. Sanders, C. Schulz, D. Strash, and M. von Looz, “Communication-free massively distributed graph generation,” J. Parallel Distributed Comput., vol. 131, pp. 200–217, 2019. [Online]. Available: https://doi.org/10.1016/j.jpdc.2019.03.011
- J. Shalf, S. S. Dosanjh, and J. Morrison, “Exascale computing technology challenges,” in High Performance Computing for Computational Science - VECPAR 2010 - 9th International conference, Berkeley, CA, USA, June 22-25, 2010, Revised Selected Papers, ser. Lecture Notes in Computer Science, J. M. L. M. Palma, M. J. Daydé, O. Marques, and J. C. Lopes, Eds., vol. 6449. Springer, 2010, pp. 1–25. [Online]. Available: https://doi.org/10.1007/978-3-642-19328-6_1
- F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir, “Toward exascale resilience: 2014 update,” Supercomput. Front. Innov., vol. 1, no. 1, pp. 5–28, 2014. [Online]. Available: https://doi.org/10.14529/jsfi140101
- M. Snir, R. W. Wisniewski, J. A. Abraham, S. V. Adve, S. Bagchi, P. Balaji, J. F. Belak, P. Bose, F. Cappello, B. Carlson, A. A. Chien, P. Coteus, N. DeBardeleben, P. C. Diniz, C. Engelmann, M. Erez, S. Fazzari, A. Geist, R. Gupta, F. Johnson, S. Krishnamoorthy, S. Leyffer, D. Liberty, S. Mitra, T. S. Munson, R. Schreiber, J. Stearley, and E. V. Hensbergen, “Addressing failures in exascale computing,” Int. J. High Perform. Comput. Appl., vol. 28, no. 2, pp. 129–173, 2014. [Online]. Available: https://doi.org/10.1177/1094342014522573
- W. Bland, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of mpi communication capability: Design and rationale,” The International Journal of High Performance Computing Applications, vol. 27, no. 3, pp. 244–254, 2013. [Online]. Available: https://doi.org/10.1177/1094342013488238
- O. Villa, D. Chavarría-Miranda, V. Gurumoorthi, A. Márquez, and S. Krishnamoorthy, “Effects of floating-point non-associativity on numerical computations on massively multithreaded systems,” in Proceedings of Cray User Group Meeting (CUG), vol. 3, 2009. [Online]. Available: https://cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/01-5Monday/4C-Villa/villa-paper.pdf
- C. Stelz, “Core-Count Independent Reproducible Reduce,” Bachelor’s Thesis, Karlsruhe Institute of Technology, Apr. 2022. [Online]. Available: https://cme.h-its.org/exelixis/pubs/bachelorChristop.pdf
- J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” 2004.
- T. Bingmann et al., “Thrill: High-performance algorithmic distributed batch data processing with C++,” in IEEE BigData, 2016, pp. 172–183. [Online]. Available: https://doi.org/10.1109/BigData.2016.7840603