Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI (2404.05610v4)

Published 8 Apr 2024 in cs.DC

Abstract: The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone. We propose novel C++ language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors. By exploiting C++'s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time. We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. MPI Forum, “MPI: A message-passing interface standard – version 4.1,” 2023.
  2. I. Laguna et al., “A large-scale study of MPI usage in open-source HPC applications,” in Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, 11 2019. [Online]. Available: http://dx.doi.org/10.1145/3295500.3356176
  3. M. Ruefenacht et al., “MPI language bindings are holding MPI back,” CoRR, 2021. [Online]. Available: http://arxiv.org/abs/2107.10566v1
  4. MPI Forum, “MPI: A message-passing interface standard – version 2.2,” 2009.
  5. A. C. Demiralp et al., “A C++20 Interface for MPI 4.0,” 2022, poster at SC’22.
  6. H. Bauke. (2015) MPL - a message passing library. [Online]. Available: https://github.com/rabauke/mpl
  7. S. Ghosh et al., “Towards modern c++ language support for mpi,” in 2021 Workshop on Exascale MPI (ExaMPI), 11 2021. [Online]. Available: http://dx.doi.org/10.1109/ExaMPI54564.2021.00009
  8. U. Manber and E. W. Myers, “Suffix arrays: A new method for on-line string searches,” SIAM J. Comput., vol. 22, no. 5, pp. 935–948, 1993.
  9. A. M. Kozlov, D. Darriba, T. Flouri, B. Morel, and A. Stamatakis, “RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference,” Bioinformatics, vol. 35, no. 21, pp. 4453–4455, 05 2019. [Online]. Available: https://doi.org/10.1093/bioinformatics/btz305
  10. A. Correa. (2018) boost-mpi3. [Online]. Available: https://gitlab.com/correaa/boost-mpi3
  11. A. Polukhin, “Boost.pfr,” 2016. [Online]. Available: https://www.boost.org/doc/libs/1_84_0/doc/html/boost_pfr.html
  12. L. D. Dalcin et al., “Parallel distributed computing using python,” Advances in Water Resources, vol. 34, no. 9, pp. 1124–1139, 2011. [Online]. Available: http://dx.doi.org/10.1016/j.advwatres.2011.04.013
  13. L. Dalcín and Y. L. Fang, “mpi4py: Status update after 12 years of development,” Comput. Sci. Eng., vol. 23, no. 4, pp. 47–54, 2021. [Online]. Available: https://doi.org/10.1109/MCSE.2021.3083216
  14. B. Steinbusch, A. Gaspar, and J. Brown, “rsmpi - MPI bindings for rust,” 2015. [Online]. Available: https://github.com/rsmpi/rsmpi
  15. H. S. Bjarne Stroustrup et al. (2024) C++ core guidelines. [Online]. Available: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.html
  16. W. S. Grant and R. Voorhies, “cereal – a C++11 library for serialization,” 2017. [Online]. Available: http://uscilab.github.io/cereal/
  17. J. Kärkkäinen, P. Sanders, and S. Burkhardt, “Linear work suffix array construction,” J. ACM, vol. 53, no. 6, pp. 918–936, 2006.
  18. T. Bingmann, “pdcx, https://github.com/bingmann/pDCX,” 2018.
  19. J. Fischer and F. Kurpicz, “Lightweight distributed suffix array construction,” in 21st Workshop on Algorithm Engineering and Experiments (ALENEX).   SIAM, 2019, pp. 27–38.
  20. T. Bingmann, S. Gog, and F. Kurpicz, “Scalable construction of text indexes with thrill,” in IEEE BigData.   IEEE, 2018, pp. 634–643.
  21. G. M. Slota, S. Rajamanickam, and K. Madduri, “A case study of complex graph analysis in distributed memory: Implementation and optimization,” in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2016, pp. 293–302.
  22. M. Davis, G. Efstathiou, C. S. Frenk, and S. D. White, “The evolution of large-scale structure in a universe dominated by cold dark matter,” Astrophysical Journal, Part 1 (ISSN 0004-637X), vol. 292, vol. 292, pp. 371–394, 1985.
  23. J. Hötzer, M. Jainta, A. Vondrous, J. Ettrich, A. August, D. Stubenvoll, M. Reichardt, M. Selzer, and B. Nestler, “Phase-field simulations of large-scale microstructures by integrated parallel algorithms,” in High Performance Computing in Science and Engineering ‘14.   Springer, 2015, pp. 629–644.
  24. P. Sanders and D. Seemaier, “Distributed deep multilevel graph partitioning,” in Euro-Par 2023: Parallel Processing.   Springer Nature Switzerland, 2023, pp. 443–457.
  25. A. Stamatakis, “RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies,” Bioinformatics, vol. 30, no. 9, pp. 1312–1313, 01 2014. [Online]. Available: https://doi.org/10.1093/bioinformatics/btu033
  26. L. V. Kalé, S. Kumar, and K. Varadarajan, “A framework for collective personalized communication,” in 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 22-26 April 2003, Nice, France, CD-ROM/Abstracts Proceedings.   IEEE Computer Society, 2003, p. 69. [Online]. Available: https://doi.org/10.1109/IPDPS.2003.1213166
  27. T. Hoefler, C. Siebert, and A. Lumsdaine, “Scalable communication protocols for dynamic sparse data exchange,” in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, R. Govindarajan, D. A. Padua, and M. W. Hall, Eds.   ACM, 2010, pp. 159–168. [Online]. Available: https://doi.org/10.1145/1693453.1693476
  28. D. Funke, S. Lamm, U. Meyer, M. Penschuck, P. Sanders, C. Schulz, D. Strash, and M. von Looz, “Communication-free massively distributed graph generation,” J. Parallel Distributed Comput., vol. 131, pp. 200–217, 2019. [Online]. Available: https://doi.org/10.1016/j.jpdc.2019.03.011
  29. J. Shalf, S. S. Dosanjh, and J. Morrison, “Exascale computing technology challenges,” in High Performance Computing for Computational Science - VECPAR 2010 - 9th International conference, Berkeley, CA, USA, June 22-25, 2010, Revised Selected Papers, ser. Lecture Notes in Computer Science, J. M. L. M. Palma, M. J. Daydé, O. Marques, and J. C. Lopes, Eds., vol. 6449.   Springer, 2010, pp. 1–25. [Online]. Available: https://doi.org/10.1007/978-3-642-19328-6_1
  30. F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir, “Toward exascale resilience: 2014 update,” Supercomput. Front. Innov., vol. 1, no. 1, pp. 5–28, 2014. [Online]. Available: https://doi.org/10.14529/jsfi140101
  31. M. Snir, R. W. Wisniewski, J. A. Abraham, S. V. Adve, S. Bagchi, P. Balaji, J. F. Belak, P. Bose, F. Cappello, B. Carlson, A. A. Chien, P. Coteus, N. DeBardeleben, P. C. Diniz, C. Engelmann, M. Erez, S. Fazzari, A. Geist, R. Gupta, F. Johnson, S. Krishnamoorthy, S. Leyffer, D. Liberty, S. Mitra, T. S. Munson, R. Schreiber, J. Stearley, and E. V. Hensbergen, “Addressing failures in exascale computing,” Int. J. High Perform. Comput. Appl., vol. 28, no. 2, pp. 129–173, 2014. [Online]. Available: https://doi.org/10.1177/1094342014522573
  32. W. Bland, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of mpi communication capability: Design and rationale,” The International Journal of High Performance Computing Applications, vol. 27, no. 3, pp. 244–254, 2013. [Online]. Available: https://doi.org/10.1177/1094342013488238
  33. O. Villa, D. Chavarría-Miranda, V. Gurumoorthi, A. Márquez, and S. Krishnamoorthy, “Effects of floating-point non-associativity on numerical computations on massively multithreaded systems,” in Proceedings of Cray User Group Meeting (CUG), vol. 3, 2009. [Online]. Available: https://cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/01-5Monday/4C-Villa/villa-paper.pdf
  34. C. Stelz, “Core-Count Independent Reproducible Reduce,” Bachelor’s Thesis, Karlsruhe Institute of Technology, Apr. 2022. [Online]. Available: https://cme.h-its.org/exelixis/pubs/bachelorChristop.pdf
  35. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” 2004.
  36. T. Bingmann et al., “Thrill: High-performance algorithmic distributed batch data processing with C++,” in IEEE BigData, 2016, pp. 172–183. [Online]. Available: https://doi.org/10.1109/BigData.2016.7840603
Citations (1)

Summary

We haven't generated a summary for this paper yet.