Experimental validation of scaling PP/TP for very large models and CP for very long sequences
Experimentally validate the expected benefits of further increasing pipeline or tensor parallelism for extremely large models and increasing context parallelism for very long sequences, quantifying their impact on Model FLOPs Utilization, throughput, and memory usage in large-scale settings.
References
Further increases in the degree of model parallelism (PP/TP) are expected to benefit extremely large models. Increasing context parallelism (CP) is expected to be advantageous for very long sequences; however, due to resource limitations, experimental validation of these scenarios is left for future work.
— Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide
(2602.09109 - Amer et al., 9 Feb 2026) in Section 6.4, Summary of Empirical Insights (end of section)