Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhen Tang

ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Jan 25, 2025

Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh(+3 more)

Abstract:Large Language Models (LLMs) have demonstrated remarkable abilities in tackling a wide range of complex tasks. However, their huge computational and memory costs raise significant challenges in deploying these models on resource-constrained devices or efficiently serving them. Prior approaches have attempted to alleviate these problems by permanently removing less important model structures, yet these methods often result in substantial performance degradation due to the permanent deletion of model parameters. In this work, we tried to mitigate this issue by reducing the number of active parameters without permanently removing them. Specifically, we introduce a differentiable dynamic pruning method that pushes dense models to maintain a fixed number of active parameters by converting their MLP layers into a Mixture of Experts (MoE) architecture. Our method, even without fine-tuning, consistently outperforms previous structural pruning techniques across diverse model families, including Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5.

Via

Access Paper or Ask Questions

Calculating the Midsagittal Plane for Symmetrical Bilateral Shapes: Applications to Clinical Facial Surgical Planning

Mar 11, 2018

Aarti Jajoo, Matthew Nicol, Jaime Gateno, Ken-Chung Chen, Zhen Tang, Tasadduk Chowdhury, Jainfu Li, Steve Goufang Shen, James J. Xia

Figure 1 for Calculating the Midsagittal Plane for Symmetrical Bilateral Shapes: Applications to Clinical Facial Surgical Planning

Figure 2 for Calculating the Midsagittal Plane for Symmetrical Bilateral Shapes: Applications to Clinical Facial Surgical Planning

Figure 3 for Calculating the Midsagittal Plane for Symmetrical Bilateral Shapes: Applications to Clinical Facial Surgical Planning

Figure 4 for Calculating the Midsagittal Plane for Symmetrical Bilateral Shapes: Applications to Clinical Facial Surgical Planning

Abstract:It is difficult to estimate the midsagittal plane of human subjects with craniomaxillofacial (CMF) deformities. We have developed a LAndmark GEometric Routine (LAGER), which automatically estimates a midsagittal plane for such subjects. The LAGER algorithm was based on the assumption that the optimal midsagittal plane of a patient with a deformity is the premorbid midsagittal plane of the patient (i.e. hypothetically normal without deformity). The LAGER algorithm consists of three steps. The first step quantifies the asymmetry of the landmarks using a Euclidean distance matrix analysis and ranks the landmarks according to their degree of asymmetry. The second step uses a recursive algorithm to drop outlier landmarks. The third step inputs the remaining landmarks into an optimization algorithm to determine an optimal midsaggital plane. We validate LAGER on 20 synthetic models mimicking the skulls of real patients with CMF deformities. The results indicated that all the LAGER algorithm-generated midsagittal planes met clinical criteria. Thus it can be used clinically to determine the midsagittal plane for patients with CMF deformities.

Via

Access Paper or Ask Questions