Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Expert-Token Resonance MoE: Bidirectional Routing with Efficiency Affinity-Driven Active Selection (2406.00023v3)

Published 24 May 2024 in cs.CL

Abstract: Mixture-of-Experts (MoE) architectures have emerged as a paradigm-shifting approach for LLMs, offering unprecedented computational efficiency. However, these architectures grapple with challenges of token distribution imbalance and expert homogenization, impeding optimal semantic generalization. We propose a novel expert routing framework that incorporates: (1) An efficient routing mechanism with lightweight computation. (2) An adaptive bidirectional selection mechanism leveraging resonance between experts and tokens. (3) A module that determines the lower bounds of expert capacity based on dynamic token distribution analysis, specifically designed to address drop-and-pad strategies. It is also integrated with orthogonal feature extraction module and an optimized loss function for expert localization. This framework effectively reduces expert homogeneity while enhancing the performance of the expert selection module. Additionally, we introduce a local expert strategy that simultaneously improves load balancing and reduces network communication overhead. It achieves a 40\% reduction in token processed by each expert without compromising model convergence or efficacy. When coupled with communication optimizations, the training efficiency improvements of 5.4\% to 46.6\% can be observed. After supervised fine-tuning, it exhibits performance gains of 9.7\% to 14.1\% across GDAD, GPQA, and TeleQnA benchmarks.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.