Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Mar 29, 2025

Giang Do, Hung Le, Truyen Tran

Figure 1 for S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Figure 2 for S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Figure 3 for S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Figure 4 for S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Share this with someone who'll enjoy it:

Abstract:Sparse Mixture of Experts (SMoE) enables efficient training of large language models by routing input tokens to a select number of experts. However, training SMoE remains challenging due to the issue of representation collapse. Recent studies have focused on improving the router to mitigate this problem, but existing approaches face two key limitations: (1) expert embeddings are significantly smaller than the model's dimension, contributing to representation collapse, and (2) routing each input to the Top-K experts can cause them to learn overly similar features. In this work, we propose a novel approach called Robust Sparse Mixture of Experts via Stochastic Learning (S2MoE), which is a mixture of experts designed to learn from both deterministic and non-deterministic inputs via Learning under Uncertainty. Extensive experiments across various tasks demonstrate that S2MoE achieves performance comparable to other routing methods while reducing computational inference costs by 28%.

* 4 pages

View paper on

Share this with someone who'll enjoy it:

Title:S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Paper and Code