Dice Question Streamline Icon: https://streamlinehq.com

Model message-size dependence of inter-node bandwidth in analytical scalability models

Develop enhanced analytical performance models for broadcast and shuffle in distributed multi-GPU SQL processing that treat the effective inter-node bandwidth parameter B_n as a function of message size, thereby improving prediction accuracy as the number of machines increases and per-message sizes decrease.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper proposes simple analytical models for shuffle and broadcast that assume a constant effective inter-node bandwidth B_n. Microbenchmark results show that achievable throughput depends on message size, and model–measurement comparisons indicate growing discrepancies as the cluster size increases and per-message sizes shrink.

Accounting for message-size effects in B_n would yield more accurate model-based projections and enable better model-driven resource scaling and planning for future networks and larger clusters.

References

Our models assume a constant $B_n$; however, in reality, $B_n$ varies with message size (see Figure~\ref{fig:broadcast-bw-HM}). This means that our models may become less accurate with an increasing number of nodes since the message size becomes smaller. Corrections of $B_n$ to the effect of message sizes are left as future work.

Terabyte-Scale Analytics in the Blink of an Eye (2506.09226 - Wu et al., 10 Jun 2025) in Section 6.3 (Performance Projection)