Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts (2407.09816v4)

Published 13 Jul 2024 in cs.CL

Abstract: Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference costs. In MoE, there is an important module called the router, which is used to distribute each token to the experts. Currently, the mainstream routing methods include dynamic routing and fixed routing. Despite their promising results, MoE models encounter several challenges. Primarily, for dynamic routing methods, the dispersion of training tokens across multiple experts can lead to underfitting, particularly for infrequent tokens. Additionally, though fixed routing methods can mitigate that issue, they compromise on the diversity of representations. In this paper, we propose \textbf{MaskMoE}, a method designed to enhance token-level learning by employing a routing \textbf{mask}ing technique within the \textbf{M}ixture-\textbf{o}f-\textbf{E}xperts model. MaskMoE is capable of maintaining representation diversity while achieving more comprehensive training. Experimental results demonstrate that our method outperforms previous dominant Mixture-of-Experts models in terms of both perplexity (PPL) and downstream task performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zhenpeng Su (17 papers)
  2. Zijia Lin (43 papers)
  3. Xue Bai (26 papers)
  4. Xing Wu (69 papers)
  5. Yizhe Xiong (14 papers)
  6. Haoran Lian (6 papers)
  7. Guangyuan Ma (14 papers)
  8. Hui Chen (298 papers)
  9. Guiguang Ding (79 papers)
  10. Wei Zhou (308 papers)
  11. Songlin Hu (80 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com