Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition (2402.02526v1)

Published 4 Feb 2024 in cs.LG

Abstract: Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train LLMs by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Quang Pham (20 papers)
  2. Giang Do (8 papers)
  3. Huy Nguyen (78 papers)
  4. TrungTin Nguyen (17 papers)
  5. Chenghao Liu (61 papers)
  6. Mina Sartipi (7 papers)
  7. Binh T. Nguyen (49 papers)
  8. Savitha Ramasamy (22 papers)
  9. Xiaoli Li (120 papers)
  10. Steven Hoi (38 papers)
  11. Nhat Ho (126 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.