Picture for Shaohuai Shi

Shaohuai Shi

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

Add code
Oct 23, 2024
Figure 1 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 2 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 3 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 4 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Viaarxiv icon

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

Add code
Oct 16, 2024
Figure 1 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 2 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 3 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 4 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Viaarxiv icon

Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning

Add code
Aug 27, 2024
Figure 1 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 2 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 3 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 4 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Viaarxiv icon

Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

Add code
Jun 30, 2024
Viaarxiv icon

FedImpro: Measuring and Improving Client Update in Federated Learning

Add code
Feb 10, 2024
Figure 1 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 2 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 3 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 4 for FedImpro: Measuring and Improving Client Update in Federated Learning
Viaarxiv icon

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Add code
Nov 07, 2023
Viaarxiv icon

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

Add code
Sep 03, 2023
Viaarxiv icon

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

Add code
Aug 07, 2023
Viaarxiv icon

Eva: A General Vectorized Approximation Framework for Second-order Optimization

Add code
Aug 04, 2023
Viaarxiv icon

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning

Add code
Jun 15, 2023
Viaarxiv icon