Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs (2502.12455v2)

Published 18 Feb 2025 in cs.CL

Abstract: As LLMs continue to scale, computational costs and resource consumption have emerged as significant challenges. While existing sparsification methods like pruning reduce computational overhead, they risk losing model knowledge through parameter removal. This paper proposes DSMoE (Dynamic Sparse Mixture-of-Experts), a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge based on input complexity. Additionally, we introduce a sparsity loss term to balance performance and computational efficiency. Extensive experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches across LLMing and downstream tasks, particularly excelling in generation tasks. Analysis reveals that DSMoE learns distinctive layerwise activation patterns, providing new insights for future MoE architecture design.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Minxuan Lv (5 papers)
  2. Zhenpeng Su (17 papers)
  3. Leiyu Pan (6 papers)
  4. Yizhe Xiong (14 papers)
  5. Zijia Lin (43 papers)
  6. Hui Chen (298 papers)
  7. Wei Zhou (311 papers)
  8. Jungong Han (111 papers)
  9. Guiguang Ding (79 papers)
  10. Cheng Luo (70 papers)
  11. Di Zhang (231 papers)
  12. Kun Gai (125 papers)
  13. Songlin Hu (80 papers)

Summary

We haven't generated a summary for this paper yet.