(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI (2205.10072v2)

Published 20 May 2022 in cs.DC

Abstract: We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log³ p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log² p)$ time and $O(p\log p)$ space construction of Tr\"aff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

References (21)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI (2205.10072v2)

Summary

Related Papers