Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model (2406.19905v2)

Published 28 Jun 2024 in cs.CV

Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-LLMs (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. However, the predictions are based solely on sample features and do not truly reveal the optimization directions of tokens. This may lead to severe optimization interference between different tokens assigned to an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC). Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-LLMs, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Longrong Yang (3 papers)
  2. Chaoxiang Cai (1 paper)
  3. Fan Yang (877 papers)
  4. Size Li (8 papers)
  5. Di Zhang (230 papers)
  6. Xi Li (197 papers)
  7. Dong Shen (14 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com