Achieving >1:8 Sparsity Without Performance Loss in Sparse-Pertoken MoE
Determine whether the Sparse-Pertoken Mixture-of-Experts (S-P MoE) component in the TokenMixer-Large architecture can achieve sparsity greater than 1:8 while maintaining performance without loss.
References
Whether we can achieve sparsity greater than 1:8 while maintaining performance without loss is still under exploration.
— TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders
(2602.06563 - Jiang et al., 6 Feb 2026) in Appendix, Section "First Enlarge Then Sparse"