Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the effectiveness of discrete representations in sparse mixture of experts

Nov 28, 2024

Giang Do, Kha Pham, Hung Le, Truyen Tran

Figure 1 for On the effectiveness of discrete representations in sparse mixture of experts

Figure 2 for On the effectiveness of discrete representations in sparse mixture of experts

Figure 3 for On the effectiveness of discrete representations in sparse mixture of experts

Figure 4 for On the effectiveness of discrete representations in sparse mixture of experts

Share this with someone who'll enjoy it:

Abstract:Sparse mixture of experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via indirection, which employs the discrete representation of input that points to the expert. The discrete representations are learnt via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE's ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language models and vision tasks for pre-training and fine-tuning, we show that VQMoE achieves a 28% improvement in robustness compared to other SMoE routing methods, while maintaining strong performance in fine-tuning tasks.

* 17 pages

View paper on

Share this with someone who'll enjoy it:

Title:On the effectiveness of discrete representations in sparse mixture of experts

Paper and Code