Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Jun 28, 2024

Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

Figure 1 for Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Figure 2 for Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Figure 3 for Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Figure 4 for Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Share this with someone who'll enjoy it:

Abstract:The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they employ a router to predict the routing for each token. However, the predictions are based solely on sample features and do not truly reveal the optimization direction of tokens. This can lead to severe optimization conflicts between different tokens within an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis. Specifically, we first use token-level gradients to identify conflicting tokens in experts. Then, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate the effectiveness of our method. The code will be publicly available at https://github.com/longrongyang/STGC.

View paper on

Share this with someone who'll enjoy it:

Title:Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Paper and Code