Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques (2305.10612v1)

Published 17 May 2023 in cs.DC

Abstract: In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Inter-process MPI Collective design that maximizes small message MPI collective performance at scale. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Process-in-Process shared memory techniques to eliminate unnecessary system call, page fault overhead, and extra data copy, improving intra- and inter-node message rate and throughput. Our design also boosts performance for larger messages, resulting in comprehensive improvement for various message sizes. Experimental results show that PiP-MColl outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for MPI collectives like MPI_Scatter and MPI_Allgather.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Jiajun Huang (30 papers)
  2. Kaiming Ouyang (2 papers)
  3. Yujia Zhai (26 papers)
  4. Jinyang Liu (51 papers)
  5. Min Si (4 papers)
  6. Ken Raffenetti (11 papers)
  7. Hui Zhou (86 papers)
  8. Atsushi Hori (4 papers)
  9. Zizhong Chen (41 papers)
  10. Yanfei Guo (11 papers)
  11. Rajeev Thakur (16 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.