Combine Sequence Parallelism with Expert Parallelism for Sparse (MoE) Models
Develop an approach that integrates Ulysses Sequence Parallelism (SP) with Expert Parallelism (EP) for sparse mixture-of-experts transformer models used in inference, in order to further optimize performance and scalability of these sparse models.
References
Specifically, there is no prior work that combines SP with EP to to further optimize sparse models, which we will leave as a future work.
— Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
(2509.16495 - Hidayetoglu et al., 20 Sep 2025) in Limitations and Future Work