Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nam V. Nguyen

CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

May 19, 2025

Nam V. Nguyen, Huy Nguyen, Quang Pham, Van Nguyen, Savitha Ramasamy, Nhat Ho

Figure 1 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Figure 2 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Figure 3 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Figure 4 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Abstract:Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel mechanism to route tokens to experts with the highest neural response. Theoretically, we show that the competition mechanism enjoys a better sample efficiency than the traditional softmax routing. Furthermore, we develop CompeteSMoE, a simple yet effective algorithm to train large language models by deploying a router to learn the competition policy, thus enjoying strong performances at a low training overhead. Our extensive empirical evaluations on both the visual instruction tuning and language pre-training tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies. We have made the implementation available at: https://github.com/Fsoft-AIC/CompeteSMoE. This work is an improved version of the previous study at arXiv:2402.02526

* 52 pages. This work is an improved version of the previous study at arXiv:2402.02526

Via

Access Paper or Ask Questions

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Mar 02, 2025

Nam V. Nguyen, Dien X. Tran, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le

Abstract:The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

* 18 pages

Via

Access Paper or Ask Questions

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Nov 01, 2024

Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham

Figure 1 for LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Figure 2 for LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Figure 3 for LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Figure 4 for LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Abstract:Mixture of Experts (MoEs) plays an important role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large scale MoE algorithms remain in-accessible to many researchers. This work develops \emph{LibMoE}, a comprehensive and modular framework to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training; (iii) comprehensive evaluation, LibMoE brings MoE in LLMs more accessible to a wide range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms over three different LLMs and 11 datasets under the zero-shot setting. The results show that despite the unique characteristics, all MoE algorithms perform roughly similar when averaged across a wide range of tasks. With the modular design and extensive evaluation, we believe LibMoE will be invaluable for researchers to make meaningful progress towards the next generation of MoE and LLMs. Project page: \url{https://fsoft-aic.github.io/fsoft-LibMoE.github.io}.

* 15 pages, 9 figures

Via

Access Paper or Ask Questions