Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI (2205.10072v2)

Published 20 May 2022 in cs.DC

Abstract: We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log3 p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log2 p)$ time and $O(p\log p)$ space construction of Tr\"aff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Broadcasting multiple messages in the multiport model. IEEE Transactions on Parallel and Distributed Systems, 10(5):500–508, 1999.
  2. An optimal algorithm for computing census functions in message-passing systems. Parallel Processing Letters, 3(1):19–23, 1993.
  3. Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics, 100(1–2):1–15, 2000.
  4. Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 8(11):1143–1156, 1997.
  5. Arthur M. Farley. Broadcast time in communication networks. SIAM Journal on Applied Mathematics, 39(2):385–390, 1980.
  6. Broadcasting multiple messages in the 1-in port model in optimal time. Journal of Combinatorial Optimization, 36(4):1333–1355, 2018.
  7. A new construction of broadcast graphs. Discrete Applied Mathematics, 280:144–155, 2020.
  8. An efficient heuristic for broadcasting in networks. Journal of Parallel and Distributed Computing, 66(1):68–76, 2006.
  9. Reproducible MPI benchmarking is still not as easy as you think. IEEE Transactions on Parallel and Distributed Systems, 27(12):3617–3630, 2016.
  10. Bin Jia. Process cooperation in multiple message broadcast. Parallel Computing, 35(12):572–580, 2009.
  11. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249–1268, 1989.
  12. Optimal broadcast and summation in the LogP model. In 5th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 142–153, 1993.
  13. Multiple message broadcasting in communication networks. Networks, 26:253–261, 1995.
  14. MPI Forum. MPI: A Message-Passing Interface Standard. Version 3.1, June 4th 2015. www.mpi-forum.org.
  15. Collective operations in NEC’s high-performance MPI libraries. In 20th International Parallel and Distributed Processing Symposium (IPDPS), page 100, 2006.
  16. Eunice E. Santos. Optimal and near-optimal algorithms for k𝑘kitalic_k-item broadcast. Journal of Parallel and Distributed Computing, 57(2):121–139, 1999.
  17. Jesper Larsson Träff. Brief announcement: Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 143–146. ACM, 2022.
  18. Jesper Larsson Träff. Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In IEEE International Conference on Cluster Computing (CLUSTER), pages 142–151. IEEE Computer Society, 2022.
  19. Decomposing MPI collectives for exploiting multi-lane communication. In IEEE International Conference on Cluster Computing (CLUSTER), pages 270–280. IEEE Computer Society, 2020.
  20. Optimal broadcast for fully connected processor-node networks. Journal of Parallel and Distributed Computing, 68(7):887–901, 2008.
  21. A pipelined algorithm for large, irregular all-gather problems. International Journal of High Performance Computing Applications, 24(1):58–68, 2010.
Citations (2)

Summary

We haven't generated a summary for this paper yet.