Round-optimal $n$-Block Broadcast Schedules in Logarithmic Time (2312.11236v2)
Abstract: We give optimally fast $O(\log p)$ time (per processor) algorithms for computing round-optimal broadcast schedules for message-passing parallel computing systems. This affirmatively answers the questions posed in Tr\"aff (2022). The problem is to broadcast $n$ indivisible blocks of data from a given root processor to all other processors in a (subgraph of a) fully connected network of $p$ processors with fully bidirectional, one-ported communication capabilities. In this model, $n-1+\lceil\log_2 p\rceil$ communication rounds are required. Our new algorithms compute for each processor in the network receive and send schedules each of size $\lceil\log_2 p\rceil$ that determine uniquely in $O(1)$ time for each communication round the new block that the processor will receive, and the already received block it has to send. Schedule computations are done independently per processor without communication. The broadcast communication subgraph is the same, easily computable, directed, $\lceil\log_2 p\rceil$-regular circulant graph used in Tr\"aff (2022) and elsewhere. We show how the schedule computations can be done in optimal time and space of $O(\log p)$, improving significantly over previous results of $O(p\log2 p)$ and $O(\log3 p)$. The schedule computation and broadcast algorithms are simple to implement, but correctness and complexity are not obvious. All algorithms have been implemented, compared to previous algorithms, and briefly evaluated on a small $36\times 32$ processor-core cluster.
- Broadcasting multiple messages in simultaneous send/receive systems. Discrete Applied Mathematics, 55(2):95–105, 1994.
- Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics, 100(1–2):1–15, 2000.
- Bin Jia. Process cooperation in multiple message broadcast. Parallel Computing, 35(12):572–580, 2009.
- Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249–1268, 1989.
- Multiple message broadcasting in communication networks. Networks, 26:253–261, 1995.
- MPI Forum. MPI: A Message-Passing Interface Standard. Version 4.0, June 9th 2021. www.mpi-forum.org.
- Collective operations in NEC’s high-performance MPI libraries. In 20th International Parallel and Distributed Processing Symposium (IPDPS), page 100, 2006.
- Jesper Larsson Träff. Brief announcement: Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 143–146. ACM, 2022.
- Jesper Larsson Träff. Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In IEEE International Conference on Cluster Computing (CLUSTER), pages 142–151. IEEE Computer Society, 2022.
- Jesper Larsson Träff. (Poly)logarithmic time construction of round-optimal n𝑛nitalic_n-block broadcast schedules for broadcast and irregular allgather in MPI. arXiv:2205.10072, 2022.
- Decomposing MPI collectives for exploiting multi-lane communication. In IEEE International Conference on Cluster Computing (CLUSTER), pages 270–280. IEEE Computer Society, 2020.
- Optimal broadcast for fully connected processor-node networks. Journal of Parallel and Distributed Computing, 68(7):887–901, 2008.
- Jesper Larsson Träff (36 papers)